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ASSAYS, THERAPEUTIC AND DIAGNOSTIC METHODS AND MEANS 

The present invention relates to screening methods, 
peptides, mimetics, and methods of use based on the 
surprising discovery and characterisation of an interaction 
5 between known proteins and the establishment that such 
interaction plays a key role in DNA repair, and thus 
numerous cellular processes of interest in therapeutic 
contexts. Two proteins in question are XRCC4 and DNA ligase 
IV. Interaction between XRCC4 and DNA-PKcs/Ku is also 
10 indicated. 

The invention has arisen on the basis of the work of 
the present inventors establishing for the first time 
crucial information about XRCC4 . Some information was 
available on the physiological function of this protein, it 

15 having been implicated in the Ku-associated DNA double- 
strand break repair (KADR) apparatus. However, very little 
was known about its biological activity and what its role in 
the KADR apparatus actually is. Prior to the making of the 
present invention it was not feasible to provide assays 

20 useful as primary screens for inhibitors of XRCC4 . 

Furthermore, the inventors' new cloning work has 
identified a yeast homologue of mammalian DNA ligase IV, No 
physiological function has previously been assigned to 
mammalian DNA ligase IV, but the inventors' yeast work, 

25 including analysis of the effect of knock-out mutation in 
yeast, now establishes the physiological relevance of DNA 
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ligase IV and thus provides indication of therapeutic 
contexts in which modulation of its function can be 
effected . 

The work disclosed herein establishing interaction 
5 between XRCC4 and DNA ligase IV, interaction between XRCC4 
and DNA-PKcs/Ku, and also a biological role for such 
interactions, now gives rise to screening methods for 
identifying compounds which affect the interaction, , 
particularly those which interfere with it, and which may 
10 affect or modulate particular aspects of cellular DNA repair 
activity, useful in a therapeutic context, for example in 
the treatment of proliferative disorders, and also in 
radiotherapy. Furthermore it gives rise to the rational 
design of peptides or mimetics or functional analogues which 
15 fulfil this function. 

One of the most dangerous forms of damage that can 
befall a cell is the DNA double-strand break (DSB) , which is 
the principal lethal lesion induced by ionising radiation 

2 0 and by radiomimetic agents. Consequently, cells have 

evolved highly effective and complex systems for recognising 
this type of DNA damage and ensuring that it is repaired 
efficiently and accurately. Two major pathways have evolved 
to repair DNA DSBs in eukaryotes, homologous recombination 

25 and DNA non-homologous end- joining (NHEJ) . 

Much of what is currently known about DNA NHEJ in 
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mammalian systems has been obtained through studies of a 
series of mutant rodent cell lines that were identified 
originally as being hypersensitive towards ionising 
radiation and which display severe defects in DNA DSB repair 
5 (reviewed in Jeggo et ai., 1995; Roth et al , , 1995). 
Characterisation of these cell lines has revealed that they 
fall into three complementation groups, termed IR4, IRS.and 
IR7 . The hamster cell line XR-1 defines IR4, IRS consists 
of a' number of independently isolated hamster cell mutants, 

10 and IR7 contains the hamster cell line V3 and ceils derived 
from the severe combined immune-deficient (scid) mouse. 
Various studies have shown that IR4, IRS and IRS cells are 
defective in antibody and T-cell receptor V{D)J 
recombinati-on . 

15 Considerable effort has been directed towards 

establishing the nature of the gene-products defective in 
cells of IR4, IRS and IR7, and determining how they function 
in DNA NHEJ. As a result of such studies, it was shown that 
cells of IRS and IR7 are deficient in components of the DNA- 

20 dependent protein kinase (DNA-PK) (Ku80 and DNA-PKcs, 

respectively) . DNA-PK is a nuclear protein Ser/Thr kinase 
that displays the unusual property of being activated upon 
binding to DNA DSBs or other perturbations of the DNA 
double-helix (Jackson, 1997) . In light of the biochemical 

25 properties of DNA-PK which have been established, an 

attractive hypothesis is that this enzyme serves as a DNA 



damage sensor m vivo. 

In contrast to cells of IRS and IR7, XR-1 cells of IR4 
are not deficient in a DNA-PK component, as evidenced by the 
fact that extracts of these cells have normal DNA end- 
5 binding activity (Getts and Stamato, 1994; Rathmeil and Chu, 
1994; Finnie et ai., 1996) and DNA-PK activity (Blunt et 
al,r 1995), and that expression of neither Ku80 i?or DNA^-PKcs 
complements the V(D)J recombination or radiosensitivity 
defects of XR-1 cells (Taccioli et al., 1994; Blunt et ai., 
10 1995) . Instead, it has been shown that DNA from human 
chromosome region 5ql3-14 complements XR-1 cells, the 
complementing gene being termed XRCC4 (Otevrel and Stamato, 
1995) . 

Furthermore, (Li et ai., 1995) have identified the 
15 XRCC4 gene recently through its ability to confer normal 
V(D)J recombination activity and partially restore the DSB 
repair defect on XR-1 cells, and have demonstrated that the 
XRCC4 locus is deleted in XR-1 cells. 

Interestingly, XRCC4 encodes a small 334 amino acid 
20 residue protein of calculated molecular weight of 38 kDa, 

and the human and mouse homologues of this protein have been 
shown to be approximately 75% identical (Li et ai., 1995). 
Perhaps surprisingly, however, sequence analyses reveal that 
XRCC4 is not significantly related to any previously- 
25 characterised proteins. Therefore, although it is clear 

that XRCC4 plays a crucial role in DNA DSB repair and V(D)J 



recombination, the cloning and sequencing of the cDNA for 
this factor has so far provided little clue to its mechanism 
of action. 

The Li et ai. paper is the only paper published so far 
5 on the XRCC4 protein as such. It reports that XRCC4 is not 
related to any other proteins and so its sequence gives no 
clear clues as to its function. Prior to the present work, 
therefore, the only assays available for XRCC4 were cellular 
radiosensitivity and cellular V(D)J recombination - assays 
10 that cannot be used as pximary suieens for inhibi Loia . 
Consequently, it was impossible to conceive of any 
biochemical screen for the activity of this factor. 

It should be noted too that the Li et ai. paper does 
. -not provide any evidence that XRCC4 is a nuclear protein 
15 {shown herein) and discusses on page 1084 that XRCC4 has 
putative sites for cytoplasmic protein tyrosine kinases. 
Thus, it is clear that there really was nothing known about 
how this protein might act. 

The present inventors have shown that XRCC4 exists, at 
20 least in part, in the cell nucleus and demonstrated 

convincingly that it interacts with DNA ligase IV, and also 
DNA-PKcs/Ku. 

DNA ligases are catalysts which join together Okazaki 
25 fragments during lagging strand DNA synthesis, complete 

exchange events between homologous duplex DNA molecules, and 



seal single- or double-strand breaks in the DNA that are 
produced either by the direct action of a DNA damaging 
agents or by DNA repair enzymes removing DNA lesions (for 
review, see Lindahl and Barnes, 1992) . In contrast to 
5 prokaryotic and yeast systems, where only a single species 
of DNA ligase has been previously been described (Johnston 
and Nasmyth, 1978), four biochemically distinct DNA ligases 
have been identified in mammalian cells (Tomkinson et ai., 
1991; Wei et al., 1995; Robins and Lindahl, 1996). In vitro 
10 assays, and studies of yeast and human cells containing 
mutated alleles of DNA ligase I suggest that this enzyme 
joins Okazaki fragments during DNA replication (Henderson et 
al., 1985; Malkas et aJ., 1990; Tomkinson et ai., 1991; 
Barnes et al., 1992; Li et ai., 1994; Prigent et ai., 1994; 
15 Waga et ai., 1994). Furthermore, the sensitivity of DNA 
ligase I mutant cells to ultraviolet (UV) irradiation and 
some DNA damaging agents suggests that DNA ligase I is 
involved in nucleotide excision repair and base excision 
repair (Henderson et ai., 1985; Lehmann et ai., 1988; Malkas 
20 et ai., 1990; Tomkinson et ai., 1991; Barnes et al., 1992; 
Li et ai., 1994; Prigent et ai., 1994; Waga et ai., 1994). 

Much less, however, is known about the function of the 
other three mammalian DNA ligases. It is currently unclear 
whether DNA ligase II and III arise from separate genes or 
25 by alternative splicing of the same gene (Roberts et ai., 
1994; Wang et al., 1994; Husain et ai., 1995). However, 



ligase II is induced in response to alkylation damage 
(Creissen and Shall, 1982), suggesting a role in DNA repair. 
Similarly, the elevation of a splice variant of ligase III 
(ligase III-P) levels in spermatocytes undergoing meiotic 
5 recombination (Chen et ai., 1995; Husain et al., 1995; 
Mackey et ai., 1997) and the association of another splice 
variant (ligase Ill-a) with the DNA repair protein XRCCl 
(Caldecott et ai., 1994; Thompson et al., 1990) are 
consistent with this enzyme joining DNA strand breaks to 
10 complete DNA recombination and repair (Jessberger et ai., 
1993) . Indeed, DNA ligase III, when present in a complex 
with XRCC-1, can reconstitute the ligation event necessary 
to complete base excision repair in vitro- (Kubota et ai., 
1996) . 

15 A fourth enzyme, DNA ligase IV, has been purified 

recently from human cells and has distinct biochemical 
properties from other ligases (Robins and Lindahl, 1996) . 
The physiological function of mammalian ligase IV is, 
however , unknown . 

20 In prokaryotes there is only one DNA ligase, and this 

enzyme catalyses all the DNA-joining events during 
replication, recombination and repair (Lindahl and Barnes, 
1992) . Similarly, genetic and biochemical data have 
suggested that there is only one DNA ligase in Saccharomyces 

25 cerevisiae (Lindahl and Barnes, 1992), although 

fractionation of yeast cell extracts has given an indication 
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of a second DNA ligase activity (Tomkinson et al., 1992). 

The present inventors searched for DNA ligase 
homologues in the S. cerevisiae genome, which was completely 
sequenced recently (Goffeau et al., 1996; Oliver, 1996), 
5 These searches identified a hitherto uncharacterised open 
reading frame (ORF) with sequence similarity along its 
entire length to mammalian DNA ligase IV. The ej^perimerital 
section below describes the effects of disrupting this gene, 
which the inventors have termed LIG4, on DNA replication, 

10 homologous recombination, and DNA repair in response to a 
variety of DNA-damaging agents. These studies show that 
LIG4 plays a crucial role in DNA double-strand break repair 
via the non-homologous end- joining (NHEJ) pathway but does 
not have an essential role in other DNA repair pathways- 

15 studied. 

Furthermore, it is shown that LIG4 functions in the 
same DNA repair pathway that utilises the DNA end-binding 
protein Ku. However, the phenotype of lig4 mutant yeasts is 
not identical to those of yeasts disrupted for Ku function, 
20 revealing that Ku has additional roles in genome 
maintenance . 

In summary, XRCC4 was known to be involved somehow in 
Ku-associated DNA double-strand break repair (KADR) , but its 
25 biological activity was obscure. The present inventors have 
established for the first time biological activity of XRCC4, 
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that is binding to DNA ligase IV. Furthermore, the 
physiological relevance of DNA ligase IV was not known. The 
inventors have now established that DNA ligase IV is 
important for double-strand DNA break repair via non- 
5 homologous end joining (NHEJ) - by unexpectedly identifying 
and cloning, then mutating, a yeast homologue gene and by 
establishing strong interaction between XRCC4 and DNA ligase 
IV. 

The inventors have also established that XRCC4 

Based on this and other work described below, the 
present invention in various aspects provides for modulation 
of interaction between XRCC4 and DNA ligase IV. 

Various aspect the present invention provide for the 
15 use of XRCC4 and DNA ligase IV in screening methods and 
assays for agents which modulate interaction between XRCC4 
and DNA ligase IV. 

Further aspects provide for modulation of interaction 
20 between XRCC4 and DNA-PKcs/Ku and use of these molecules in 
screening methods and assays for agents which modulate 
interaction between XRCC4 and DNA-PKcs/Ku, For simplicity, 
much of the present disclosure refers to XRCC4 and DNA 
ligase IV. However, unless the context requires otherwise, 
25 every such reference should be taken to be equally 

applicable to the interaction between XRCC4 and DNA-PKcs/Ku. 
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Methods of obtaining agents able to modulate 
interaction between XRCC4 and DNA ligase IV (or, it must be 
remembered, XRCC4 and DNA-PKcs/Ku) include methods wherein a 
suitable end-point is used to assess interaction in the 
5 presence and absence of a test substance. Detailed 
disclosure in this respect is included below. It is worth 
noting, however, that combinatorial library technology - 
provides an efficient way of testing a potentially vast 
number of different substances for ability to modulate bind 
10 to and/or activity of a polypeptide. Such libraries and 
their use are known in the art, for all manner of natural 
products, small molecules and peptides, among others. The 
use of peptide libraries may be preferred in certain 
circumstances . 

15 

Appropriate agents may be obtained, designed and used 
for any of a variety of purposes. 

One is anti-tumour or anti-cancer therapy, particularly 
augmentation of radiotherapy or chemotherapy. Ionising 
20 radiation and radiomimetic drugs are commonly used to treat 
cancer by inflicting DNA damage. Cells deficient in DNA 
repair, particularly the KADR pathway, are hypersensitive to 
ionising radiation and radiomimetics . Evidence provided 
herein shows the KADR pathway involves XRCC4 and DNA ligase 
25 IV, indicating that inhibition of their function, e.g. by 
inhibiting their interaction, will have an effect on the 
KADR pathway, DNA repair and cellular sensitivity to 



ionising radiation and radiomimetics . 

Another is the potentiation of gene targeting and gene 
therapy. Inhibition of KADR may be used to increase 
efficiencies of gene targeting, of interest and ultimate use 
5 in gene therapy. Two ways exist for repairing DNA double- 
stranded breaks (DSBs) . One is through the process of 
illegitimate recombination (also known as DNA non-homologous 
end-joining or NHEJ) and this is catalysed by the KADR 
system now known to involve XRCC4 and DNA ligase IV. The 

10 other system is the process of homologous recombination, 

whereby the. damaged DNA molecule exchanges information with 
an undamaged homologous partner DNA molecule. In mammalian 
cells, the illegitimate pathway tends to predominate. 
Inhibiting the KADR system will make the proportion of DSBs 

15 repaired by homologous recombination increase. Thus, anti- 
KADR factor agents, including those provided in accordance 
with the present invention, will have this effect. 
Homologous gene targeting is used in making knock-out mice 
and other transgenic animals but it is not very efficient, 

20 so increasing this efficiency in accordance with the present 
invention will be highly beneficial. Ultimately, gene 
therapist wish to precisely replace the mutated gene with a 
functional one. At present just to get the functional gene 
to integrate anywhere in the genome is the priority, but the 

25 long-term aim is for integration at the right site. KADR 
(e.g. XRCC4 and/or ligase IV, or XRCC4 and/or DNA-PKcs/Ku) 
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inhibitors therefore have a great therapeutic potential in 
such context . 

A further, related, purpose is in anti-retroviral 
therapy, since DNA repair pathways such as involving KADR 
5 and the components XRCC4 and DNA ligase IV are involved in 
effecting retroviral and retrotransposon integration into 
the genome of a host cell. Retroviruses are of considerable 
risk to the health of humans and animals, causing, inter 
alia, AIDS, various cancers and human adult T-celi 

10 leu kaemia/ lymphoma . Integration of retroviral DNA into the 
genome is essential for efficient viral propagation and may 
be targeted by inhibition of DNA repair pathway components. 

Additionally, modulators of KADR components such as 
XRCC4 and DNA ligase IV, DNA-PKcs/Ku;- may be used in 

15 modulation of immune system function, since such factors are 
required for generation of mature immunoglobulin and T-cell 
receptor genes by site-specific V(D)J recombination. 

Compounds which stabilise the interaction between two 
components, such as XRCC4 and DNA ligase IV, or XRCC4 . and 

20 DNA-PKcs/Ku, and which may up-regulate activity, may be 

screened for using assays in which conditions are too harsh 
for the relevant interaction. Agents which stabilise the 
interaction may be identified. One alternative is to screen 
for substances that enhance DNA ligase IV catalytic 

25 activity, which may be determined as discussed elsewhere. 
An up-regulator of activity may be used to potentiate DNA 
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repair further, and this may be in normal individuals, with 
possible long-term beneficial effects bearing in mind that 
many of the common manifestations of ageing arise through 
the gradual and inexorable accumulation of mutations in 
5 somatic cells. Up-regulators may be used in treating 
patients who are debilited in the KADR pathway or other DNA 
repair pathway. 

Interaction between XRCC4 and DNA ligase IV, or XRCC4 

10 and DNA-Pkcs/Ku may be inhibited by inhibition of the 
production of the relevant protein. For instance, 
production of one or more of these components may be 
inhibited by using appropriate nucleic acid to influence 
expression by antisense regulation. The use of anti-sense 

15 genes or partial gene sequences to down-regulate gene 

expression is now well-established. Double-stranded DNA is 
placed under the control of a promoter in a "reverse 
orientation" such that transcription of the "anti-sense" 
strand of the DNA yields RNA which is complementary to 

20 normal mRNA transcribed from the "sense" strand of the 

target gene. The complementary anti-sense RNA sequence is 
thought then to bind with mRNA to form a duplex, inhibiting 
translation of the endogenous mRNA from the target gene into 
protein. Whether or not this is the actual mode of action 

25 is still uncertain. However, it is established fact that 
the technique works. 
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Another possibility is that nucleic acid is used which 
on transcription produces a ribozyme, able to cut nucleic 
acid at a specific site - thus also useful in influencing 
gene expression. Background references for ribozymes 
5 include Kashani-Sabet and Scanlon, 1995, Cancer Gene 
Therapy, 2(3): 213-223, and Mercola and Cohen, 1995, Cancer 
Gene Therapy , 2(1), 4 7-59. 

Thus, various methods and uses of modulators, 
10 particularly inhibitors, of XRCC4 and DNA ligase IV, or 
XRCC4 and DNA-PKcs/Ku, interaction and/or activity are 
provided as further aspects of the present invention. The 
purpose of disruption, interference with or modulation of 
interaction between XRCC4 and -DNA ligase IV, and/or XRCC4 
15 and DNA-PKcs/Ku, may be to modulate any activity mediated by 
virtue of such interaction, as discussed above and further 
below. 

Brief Description of the Figures 

20 Figure 1 illustrates co-purification of XRCC4 and DNA 

ligase IV from HeLa cells. Using the protocol described 
previously (Robins, 1996), DNA ligase IV was purified from 
HeLa cells. Fractions collected from the chromatographic, 
columns were analysed on SDS-polyacrylamide gels and 

25 specific binding of antibodies was detected by immunoblots. 
The amount of specified protein in each fraction was 
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quantitated from densitometric scans of the gels. The 
amount of each protein in analysed fractions as a proportion 
of its total amount was plotted for the samples separated by- 
gel chromatography fractionation (Figure lA) , followed by a 
5 Mono S column (Figure IB) . The asterisk in Figure lA 
signifies the fraction loaded on to the Mono S column. 
Figure 2 shows a schematic representation of the 
various eukaryotic DNA. ligases. The conserved 'core' domain 
is divided into two regions; black box, very high levels of 
10 conservation; diagonally hatched box> less conserved region 
of ''coredomain" ; chequered box, putative BRCT domain (BRCAl 
C-terminus, Koonin, 1996; Callebaut, 1997) . 

Figure 3. YOROOSc encodes a homologue of mammalian DNA 
ligase IV. 

15 Figure 3A:' Amino acid sequence similarities between S. 

cerevisiae Lig4p (scLIG4; the product of the YOROOSc ORF) 
and human DNA ligase IV (hLIGIV) . The alignment was 
generated using the PILEUP programme on the GCG (Genetics 
Computer Group, Wisconsin) package, and identical and 

20 similar amino acid residues are indicated by reverse shading 
and grey shading, respectively, using the BOXSHADE 
programme. Amino acid residues are numbered from the amino 
termini of the full-length polypeptides. Gaps were 
introduced for maximum alignment. The active site lysine 

25 residue is indicated with an arrowhead. The "core" conserved 
region of DNA ligases of eukaryotes and eukaryotic viruses 
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is delineated with a bar. 

Figure 3B: Phylogenetic tree showing the evolutionary 
relationship of DNA ligases from eukaryotes, eukaryotic 
viruses and T7 bacteriophage. The phenogram was generated 
5 using the PHYLIP package with the aligned "core" conserved 
sequence of each protein as designated in Figure 3A using 
the UPGMA method. Accession numbers are as follows: A. 
thaliana I (X97924); C. albicans (X95001); C. elegans I 
(Z73970); Fowlpox virus (U00761); H. sapiens 1 (M36067), III 
10 (X84740) and IV (X83441) ; M. musculus I (U04674); Rabbit 
fibroma virus {Z29716) ; S. cerevlslae I (X03246), IV 
CZ74913); S. pomhe I (X05107) ; T7 bacteriophage (P00969) ; 
Vaccinia virus (X16512) ; X. laevls I (L43496) . 

Figure 3C shows a schematic representation of the 
15 putative domain structure (based on amino acid sequence 
homologies) of various DNA ligases. White box, "core" 
conserved ligase domain; black box, active site residues- 
shaded box, N-terminal conserved domain of eukaryotic and 
eukaryotic viral DNA ligases; striped box, zinc-finger-like 
2 0 DNA-binding domain; diagonally hatched box, putative BRCT 

domain (BRCAl C-terminus; Koonin et ai., 1996; Callebaut and 
Mornon, 1997) . 

Figure 4 shows that LIG4 functions in the Ku-dependent 
pathway for repairing ionising radiation-induced DNA damage. 
25 The sensitivity of various yeast strains to killing by 
ionising radiation was judged by exposure to various 



radiation doses. Error bars are not shown for simplicity; 
standard deviation is < 5 % of each value point. 

Figure 4A: Unlike rad52 mutants, lig4 mutant yeast are 
as resistant to ionising radiation as parental strains: lig4 
5 mutants were not significantly more sensitive even at high 
doses (up to 45 kRad) . 

Figure 4B: As in the case for YKU70, disruption ot- LIG4 
hypersensitises yeast to ionising radiation in rad52 mutant 
backgrounds. Furthermore, Ilg4/yku70/rad52 triple mutants 
10 are not appreciably more radiosensitive than are yku70/rad52 
double mutant or Ilg4/rad52 double mutant strains, 
indicating that Lig4p and Yku70p function on the same RAD52- 
independent repair pathway. 

Figure 5A shows a plasmid map of the shuttle vector 
15 pBTM116 showing the locations of the yeast selectable marker 
TRPl, the 3-lactamase gene, the ADHl promoter and the 
restriction enzyme cleavage sites. 

Figure 5B shows that disruption of LIG4 results in a 
dramatic reduction in the ability to repair restriction 
20 enzyme generated cohesive DNA DSBs in plasmid (pBTM116) DNA. 
Cells for each strain were transformed in parallel with 
supercoiled pBTM116 or with an equivalent amount of pBTM116 
that had been digested to completion with f:coRI or PstI, as 
indicated. For each experiment, the value plotted is the 
25 number of transf ormants obtained with the linear plasmid 
expressed as a percentage of the number obtained for 
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supercoiled DNA. Each experiment was repeated at least three 
times and, in each case, cells were plated and counted in 
duplicate . 

Figure 5C shows that in the absence of functional LIG4 , 
5 the cohesive DNA termini are repaired by an inefficient 
error-prone DNA repair pathway. The plasmid pBTM116 contains 
the ADHl promoter and some repair products have been 
generated by the gap-repair process involving the 
chromosomal ADHl gene; the striped region represents DNA 

10 derived from the genomic locus. Gaps indicate deletions. A 
single letter in small case represents a single plasmid 
analysed in this study (representative transf ormants are 
represented) . 

Figure 6 shows a model in which XRCC4 serves as a 

15 molecular bridge to target DNA ligase IV to a DNA DSB. In 
this model (which is proposed without in any way limiting 
the nature or scope of any aspect of the present invention 
or embodiment thereof) , Ku binds to the free DNA ends and 
recruits DNA-PKcs, activating the kinase catalytic function 

20 of the latter. A DNA ligase IV/XRCC4 complex is then 

recruited to the DNA DSB. One or more additional components 
may be involved: some possibilities are indicated by means 
of question marks. Active DNA-PK may also trigger DNA 
damage signalling events or may phosphorylate other DNA DSB 

25 repair components, such as XRCC4, thus regulating their 
activity. The stoichiomet ry of the XRCC4-DNA ligase IV 



complex is for the purpose of illustration only. DNA ligase 
IV may interact with XRCC4 anywhere within residues 550-884 
of human XRCC4, e.g. at or between the BRCT domains. 

Figure 7 shows the amino acid sequence and encoding 
5 nucleotide sequence for human XRCC4 . Translation begins at 
the start site indicated by the arrow. 

Figure 8 shows the amino acid sequence and encoding 
nucleotide sequence for human DNA ligase IV. Translation 
begins at the start site indicated by the arrow. 
LO Figure 9 shows the amino acid sequence and encoding 

nucleotide sequence for S. cerevislae LIG4 . Translation 
begins at the start site indicated by the arrow. 

Figure lOA shows the amino acid sequence of human DNA- 
PKcs . Hartley et ai. originally provided the sequence, 
.5 though lacking an intron. Poltoratsky et al, provided a 
partial sequence including the intron not included in the 
Hartley et al. sequence. The sequence shown, available from 
GenBank, is complete. 

Figure lOB shows the encoding nucleotide sequence of 
0 human DNA-PKcs. 

Figure 11 shows the amino acid sequence and encoding 
nucleotide sequence of the Ku80 subunit of human Ku. 

Figure 12 shows the amino acid sequence and encoding 
nucleotide sequence of the Ku70 subunit of human Ku . 
5 Reference and GenBank accession details are also 

included. 
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All documents mentioned anywhere in this specification 
are incorporated by reference. 

The present invention in various aspects provides for 
5 modulating, interfering with or interrupting interacion 
between the XRCC4 protein and DNA ligase IV, using an 
appropriate agent. The present invention also provides . in 
analogous aspects for modulating, interfering with or 
interrupting interaction between the XRCC4 protein and DNA- 
10 Pkcs/Ku, using an appropriate agent . 

Such an agent capable of modulating interaction between 
XRCC4 and DNA ligase IV may be capable of blocking binding 
between a site located within amino acid residues 550-884 of 

15 human XRCC4, which may be at one or other or both of the 
BRCT domains (discussed further below) , or between these 
domains, and a site on human DNA ligase IV. The site on 
XRCC4 may be between amino acid residues 591-676, between 
amino acid residues 728-844 or between residues 677-727. 

20 The full amino acid sequence of the XRCC4 protein has been 
elucidated and is set out in Li et ai. (Ceil (1995) 83, 
1079-1089) which is incorporated herein by reference, and of 
which the amino acid residue numbering is used, and in 
Figure 7, along with the encoding nucleic acid sequence. 

25 The DNA ligase IV amino acid and nucleotide coding sequences 
are given in Wei et al., of which the amino acid residue 



numbering is used, and Figure 8, with the yeast LIG4 
sequences being shown in Figure 9. 

Such agents may be identified by screening techniques 
5 which involve determining whether an agent under test 
inhibits or disrupts the binding of XRCC4 protein or a 
suitable fragment thereof (e.g. including amino acid 
residues 550-884, residues 591-676, residues 728-844 or 
residues 677-727, or a smaller fragment of any of these 
10 regions) of human XRCC4, with DNA ligase IV or a fragment 
thereof, or a suitable analogue, fragment or variant 
thereof. 

... Suitable fragments of.XRCC4 or DNA ligase IV include 
those which include residues which interact with the 

15 counterpart protein. Smaller fragments, and analogues and 
variants of this fragment may similarly be employed, e.g. as 
identified using techniques such as deletion analysis or 
alanine scanning. 

Thus, the present invention provides a peptide fragment 

20 of XRCC4 which is able to bind DNA ligase IV and/or inhibit 
interaction between XRCC4 and DNA ligase IV, and provides a 
peptide fragment of DNA ligase IV which is able to bind DNA 
ligase IV and/or inhibit interaction between DNA ligase IV 
and XRCC4, such peptide fragments being obtainable by means 

25 of deletion analysis and/or alanine scanning of the relevant 
protein - making an appropriate mutation in sequence. 
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bringing together a mutated fragment of one of the proteins 
with the other or a fragment thereof and determining 
interaction. In preferred embodiments, the peptide is 
short, as discussed below, and may be a minimal portion that 
5 is able to interact with the relevant counterpart protein 
and/or inhibit the relevant interaction. 

Of course, similar considerations apply to ^RRC4 and 
DNA-PKcs/Ku interacting portions. 

10 Screening methods and assays are discussed in further 

detail, below. 

One class of agents that can be used to disrupt the 
binding of XRCC4 and DNA ligase IV are peptides based on the 

15 sequence motifs of XRCC4 or DNA ligase IV that interact with 
counterpart . DNA ligase IV or XRCC4 . Such peptides tend to 
be short, and may be about 40 amino acids in length or less, 
preferably about 35 amino acids in length or less> more 
preferably about 30 amino acids in length, or less, more 

20 preferably about 25 amino acids or less, more preferably 

about 20 amino acids or less, more preferably about 15 amino 
acids or less, more preferably about 10 amino acids or less, 
or 9, 8, 7, 6, 5 or less in length. The present invention 
also encompasses peptides which are sequence variants or 

25 derivatives of a wild type XRCC4 or DNA ligase IV sequence, 
but which retain ability to interact with DNA ligase IV or 
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XRCC4 (respectively, as the case may be) and/or ability to 
modulate interaction between XRCC4 and DNA iigase IV. 

Instead of using a wild-type XRCC4 or DNA ligase IV 
fragment, a peptide or polypeptide may include an amino acid 
5 sequence which differs by one or more amino acid residues 
from the wild-type amino acid sequence, by one or more of 
addition, insertion, deletion and substitution of one or 
more amino acids. Thus, variants, derivatives, alleles, 
mutants and homologues, e.g. from other organisms, are 
10 included . 

Preferably, the amino acid sequence shares homology 
with a fragment of the relevant XRCC4 or DNA ligase 
IV fragment sequence shown preferably at least about 30%, or 
40%, or 50%, or 60%, or 70%, or 75%, -or 80%, or 85% 
15 homology, or at least about 90% or 95% homology. Thus, a 
peptide fragment of XRCC4 or DNA ligase IV may include 1, 2, 
3, 4, 5, greater than 5, or greater than 10 amino acid 
alterations such as substitutions with respect to the wild- 
type sequence. 

20 A derivative of a peptide for which the specific 

sequence is disclosed herein may be in certain embodiments 
the same length or shorter than the specific peptide. In 
other embodiments the peptide sequence or a variant thereof 
may be included in a larger peptide, as discussed above, 

25 which may or may not include an additional portion of XRCC4 
or DNA ligase IV. 1, 2, 3, 4 or 5 or more additional amino 
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acids, adjacent to the relevant specific peptide fragment in 
XRCC4 or DNA iigase IV, or heterologous thereto may be 
included at one end or both ends of the peptide. 

(It should not be forgotten that references to XRCC4 
5 and DNA ligase IV apply equally to XRCC4 and DNA-Pkcs/Ku. ) 

As is well-understood, homology at the amino acid level 
is generally in terms of amino acid similarity or identity. 
Similarity allows for "conservative variation", i.e. 
substitution of one hydrophobic residue such as isoleucine, 

10 valine, leucine or methionine for another, or the 

substitution of one polar residue for another, such as 
argininefor lysine, glutamic for aspartic acid, or 
glutamine for asparagine. Similarity may be as defined and 
determined by the TBLASTN program, of Altschul et al, (1990) 

15 J, Mol. Biol, 215: 403-10, which is in standard use in the 
art. Homology may be over the full-length of the relevant 
peptide or over a contiguous sequence of about 5, 10, 15, 
20, 25, 30, 35, 50, 75, 100 or more amino acids, compared 
with the relevant wild-type amino acid sequence. 

20 As noted, variant peptide sequences and peptide and 

non-peptide analogues and mimetics may be employed, as 
discussed further below. 

Various aspects of the present invention provide a 
substance, which may be a single molecule or a composition 

25 including two or more components, which includes a peptide 
fragment of XRCC4 or DNA ligase IV which includes a sequence 
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as recited in Figure 7 or 8 respectively, a peptide 
consisting essentially of such a sequence, a peptide 
including a variant, derivative or analogue sequence, or a 
non-peptide analogue or mimetic which has the ability to 
5 bind XRCC4 or DNA ligase IV and/or modulate, disrupt or 
interfere with interaction between XRCC4 or DNA ligase IV. 
Variants include peptides in which individual amino 
acids can be substituted by other amino acids which are 
closely related as is understood in the art and indicated 
10 above. 

Non-peptide mimetics of peptides are discussed further 
below . 

As noted, a peptide according to the present invention 
15 and for use in various aspects of the present invention may 
include or consist essentially of a fragment of XRCC4 or DNA 
ligase IV as disclosed, such as a fragment whose sequence is 
shown in Figure 7 or Figure 8, respectively. Where one or 
more additional amino acids are included, such amino acids 
20 may be from XRCC4 or DNA ligase IV or may be heterologous or 
foreign to XRCC4 or DNA ligase IV. A peptide may also be 
included within a larger fusion protein, particularly where 
the peptide is fused to a non-XRCC4 or DNA ligase IV (i.e. 
heterologous or foreign) sequence, such as a polypeptide or 
25 protein domain. 

The invention also includes derivatives of the 
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peptides, including the peptide linked to a coupling 
partner, e.g. an effector molecule, a label, a drug, a toxin 
and/or a carrier or transport molecule, and/or a targeting 
molecule such as an antibody or binding fragment thereof or 
5 other ligand. Techniques for coupling the peptides of the 
invention to both peptidyl and non-peptidyl coupling 
partners are well known in the art. In one embodiment the 
carrier molecule is a 16 aa peptide sequence derived from 
the homeodomain of Antennapedla (e.g. as sold under the name 

10 "Penetratin") , which can be coupled to a peptide via a 
terminal Cys residue. The ^"Penetratin" molecule and its 
properties are described in WO 91/18981. 

Peptides may be generated wholly or partly by chemical 
synthesis. The compounds of the present invention can be 

15 readily prepared according to well-established, standard 
liquid or, preferably, solid-phase peptide synthesis 
methods, general descriptions of which are broadly available 
(see, for example, in J.M. Stewart and J.D. Young, Solid 
Phase Peptide Synthesis, 2nd edition. Pierce Chemical 

20 Company, Rockford, Illinois (1984), in M. Bodanzsky and A. 
Bodanzsky, The Practice of Peptide Synthesis, Springer 
Verlag, New York (1984); and Applied Biosystems 430A Users 
Manual, ABI Inc., Foster City, California), or they may be 
prepared in solution, by the liquid phase method or by any 

25 combination of solid-phase, liquid phase and solution 

chemistry, e.g. by first completing the respective peptide 
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portion and then, if desired and appropriate, after removal 
of any protecting groups being present, by introduction of 
the residue X by reaction of the respective carbonic or 
sulfonic acid or a reactive derivative thereof. 

5 

Another convenient way of producing a peptidyl molecule 
according to the present invention (peptide or polypeptide) 
is to express nucleic acid encoding it, by use of nucleic 
acid in an expression system. 

10 Accordingly the present invention also provides in 

various aspects nucleic acid encoding the polypeptides and 
peptides of the invention. 

Generally, nucleic acid according to the present 
invention is provided as an isolate, in isolated and/or 

15 purified form, or free or substantially free of material 
with which it is naturally associated, such as free or 
substantially free of nucleic acid flanking the gene in the 
human genome, except possibly one or more regulatory 
sequence (s) for expression. Nucleic acid may be wholly or 

20 partially synthetic and may include genomic DNA, cDNA or 

RNA. Where nucleic acid according to the invention includes 
RNA, reference to the sequence shown should be construed as 
reference to the RNA equivalent, with U substituted for T. 
Nucleic acid sequences encoding a polypeptide or 

25 peptide in accordance with the present invention can be 

readily prepared by the skilled person using the information 
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and references contained herein and techniques known in the 
art (for example, see Sambrook, Fritsch and Maniatis, 
"'Molecular Cloning, A Laboratory Manual, Cold Spring Harbor 
Laboratory Press, 1989, and Ausubel et al. Short Protocols 
5 in Molecular Biology, John Wiley and Sons, 1992) , given the 
nucleic acid sequence and clones available. These 
techniques include (i) the use of the polymerase, chain • 
reaction (PCR) to amplify samples of such nucleic acid, e.g. 
from genomic sources, (ii) chemical synthesis, or (iii) 

10 preparing cDNA sequences. DNA encoding XRCC4 or DNA ligase 
IV fragments may be generated and used in any suitable way 
known to those of skill in the art, including by taking 
encoding DNA, identifying suitable restriction enzyme 
recognition sites either side of the porti-on to be 

15 expressed, and cutting out said portion from the DNA. The 
portion may then be operably linked to a suitable promoter 
in a standard commercially available expression system. 
Another recombinant approach is to amplify the relevant 
portion of the DNA with suitable PCR primers. Modifications 

20 to the XRCC4 or DNA ligase IV sequences can be made, e.g. 
using site directed mutagenesis, to lead to the expression 
of modified XRCC4 or DNA ligase IV peptide or to take 
account of codon preference in the host cells used to 
express the nucleic acid. 

25 

In order to obtain expression of the nucleic acid 
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sequences, the sequences can be incorporated in a vector 
having one or more control sequences operably linked to the 
nucleic acid to control its expression. The vectors may 
include other sequences such as promoters or enhancers to 
5 drive the expression of the inserted nucleic acid, nucleic 
acid sequences so that the polypeptide or peptide is 
produced as a fusion and/or nucleic acid encoding secretion 
signals so that the polypeptide produced in the host cell is 
secreted from the cell. Polypeptide can then be obtained by 

10 transforming the vectors into host cells in which the vector 
is functional, culturing the host cells so that the 
polypeptide is produced and recovering the polypeptide from 
the host cells or the surrounding medium. Prokaryotic and 
eukaryotic cells are used for this purpose in the art, 

15 including strains of E. coli, yeast, and eukaryotic cells 
such as COS or CHO cells. 

Thus, the present invention also encompasses a method 
of making a polypeptide or peptide (as disclosed) , the 
method including expression from nucleic acid encoding the 

20 polypeptide or peptide (generally nucleic acid according to 
the invention) . This may conveniently be achieved by 
growing a host cell in culture, containing such a vector, 
under appropriate conditions which cause or allow expression 
of the polypeptide. Polypeptides and peptides may also be 

25 expressed in in vitro systems, such as reticulocyte lysate. 

Systems for cloning and expression of a polypeptide in 
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a variety of different host cells are well known. Suitable 
host cells include bacteria, eukaryotic cells such as 
mammalian and yeast, and baculovirus systems. Mammalian 
cell lines available in the art for expression of a 
5 heterologous polypeptide include Chinese hamster ovary 
cells, HeLa cells, baby hamster kidney cells, COS cells and 
many others. A common, preferred bacterial host, is E. coli. 

Suitable vectors can be chosen or constructed, 
containing appropriate regulatory sequences, including 

10 promoter sequences, terminator fragments, polyadenylation 
sequences, enhancer sequences, marker genes and other 
sequences as appropriate. Vectors may be plasmids, viral 
e.g. 'phage, or phagemid, as appropriate. For further 
details see, for example. Molecular Cloning: a Laboratory 

15 Manual: 2nd edition, Sambrook et al., 1989, Cold Spring 
Harbor Laboratory Press. Many known techniques and 
protocols for manipulation of nucleic acid, for example in 
preparation of nucleic acid constructs, mutagenesis, 
sequencing, introduction of DNA into cells and gene 

20 expression, and analysis of proteins, are described in 

detail in Current Protocols in Molecular Biology, Ausubel et 
al. eds., John Wiley & Sons, 1992. 

Thus, a further aspect of the present invention 
provides a host cell containing heterologous nucleic acid as 

25 disclosed herein. 

The nucleic acid of the invention may be integrated 
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into the genome (e.g. chromosome) of the host cell. 
Integration may be promoted by inclusion of sequences which 
promote recombination with the genome, in accordance with 
standard techniques. The nucleic acid may be on an extra- 
5 chromosomal vector within the cell, or otherwise 
identifiably heterologous or foreign to the cell. 

A still further aspect provides a method which includes 
introducing the, nucleic acid into a host cell. The 
introduction, which- may . (particularly for in vitro 

10 introduction) be generally referred to without limitation as 
^^transformation", may employ any available technique. For 
eukaryotic cells, suitable techniques may include calcium 
phosphate transf ection, DEAE-Dextran, eiectroporation, 
liposome-mediated transfection and transduction using 

15 retrovirus or other virus, e.g. vaccinia or, for insect 
cells, baculovirus. For bacterial cells, suitable 
techniques may include calcium chloride transformation, 
eiectroporation and transfection using bacteriophage. As an 
alternative, direct injection of the nucleic acid could be 

20 employed. 

Marker genes such as antibiotic resistance or 
sensitivity genes may be used in identifying clones 
containing nucleic acid of interest, as is well known in the 
art. 

25 The introduction may be followed by causing or allowing 

expression from the nucleic acid, e.g. by culturing host 
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cells (which may include cells actually transformed although 
more likely the cells will be descendants of the transformed 
cells) under conditions for expression of the gene, so that 
the encoded polypeptide (or peptide) is produced. If the 
5 polypeptide is expressed coupled to an appropriate signal 
leader peptide it may be secreted from the cell into the 
culture medium. Following production by expression, a • 
polypeptide or peptide may be isolated and/or purified from 
the host cell and/or culture medium, as the case may be, and 
10 subsequently used as desired^- e.g. in the formulation of a 
composition which may include one or more additional 
components, such as a pharmaceutical composition which 
includes one or more pharmaceutically acceptable excipients, 
vehicles or carriers .(e-.g. see below) . 

15 

Introduction of nucleic acid encoding a peptidyl 
molecule according to. the present invention may take place 
in vivo by way of gene therapy, to disrupt or interfere with 
interaction between XRCC4 or DNA ligase IV 

20 Thus, a host cell containing nucleic acid according to 

the present invention, e.g. as a result of introduction of 
the nucleic acid into the cell or into an ancestor of the 
cell and/or genetic alteration of the sequence endogenous to 
the cell or ancestor (which introduction or alteration may 

25 take place in vivo or ex vivo), may be comprised (e.g. in 
the soma) within an organism which is an animal. 
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particularly a mammal, which may be human or non-human, such 
as rabbit, guinea pig, rat, mouse or other rodent, cat, dog, 
pig, sheep, goat, cattle or horse, or which is a bird, such 
as a chicken. Genetically modified or transgenic animals or 
5 birds comprising such a cell are also provided as further 
aspects of the present invention. 

This may have a therapeutic aim. (Gene therapy is- 
discussed below.) Also, the presence of a mutant, allele, 
derivative or variant sequence within cells of an organism, 

10 particularly when in place of a homologous endogenous 

sequence, may allow the organism to be used as a model in 
testing and/or studying substances which modulate activity 
of the encoded polypeptide in vitro or are otherwise 
indicated to be of therapeutic potential. Knock-out mice, 

15 for instance, may be used to test for radiosensitivity . 

Conveniently, however, at least preliminary assays for such 
substances may be carried out in vitro, that is within host 
cells or in cell-free systems. Where an effect of a test 
compound is established on cellsin vitro, those cells or 

20 cells of the same or similar type may be grafted into an 
appropriate host animal for in vivo testing. 

Suitable screening methods are conventional in the art. 
They include techniques such as radioimmunosassay, 
scintillation proximetry assay and ELISA methods. Suitably 

25 either the XRCC4 protein or fragment or DNA ligase IV or 
fragment, or an analogue, derivative, variant or functional 
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mimetic thereof, is immobilised whereupon the other is 
applied in the presence of the agents under test. In a 
scintillation proximetry assay a biotinylated protein 
fragment is bound to streptavidin coated scintillant - 
5 impregnated beads {produced by Amersham) . Binding of 
radiolabelled peptide is then measured by determination of 
radioactivity induced scintillation as the radioactive . 
peptide binds to the immobilized fragment. Agents which 
intercept this are thus inhibitors of the interaction. 
10 Further ways and means of screening for agents which 

modulate interaction between XRCC4 and DNA ligase IV are 
discussed below. 

In one general aspect, the present invention provides 
15 an assay method for a substance with ability to modulate, 
e.g. disrupt or interfere with interaction or binding 
between XRCC4 and DNA ligase IV, the method including: 

(a) bringing into contact a substance according to the 
invention including a peptide fragment of XRCC4 or a 

20 derivative, variant or analogue thereof as disclosed, a 

substance including the relevant fragment of DNA ligase IV 
or a variant, derivative or analogue thereof, and a test 
compound, under conditions wherein, in the absence of the 
test compound being an inhibitor of interaction or binding 

25 of said substances, said substances interact or bind; and 

(b) determining interaction or binding between said 
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substances . 

A test compound which disrupts, reduces, interferes 
with or wholly or partially abolishes binding or interaction 
between said substances (e.g. including a XRCC4 fragment and 
5 including a DNA ligase IV fragment), and which may modulate 
XRCC4 and/or DNA ligase IV activity, may thus be identified. 
Another general aspect of the present invent;ion 
provides an assay method for a substance able to bind the 
relevant region of XRCC4 or DNA ligase IV as the case may 
10 be, the method including: 

(a) bringing into contact a substance which includes a 
peptide fragment of XRCC4 which interacts with DNA ligase IV 
as disclosed, or which includes a peptide fragment of DNA 
ligase IV which interacts with XRCC4, or a variant, 

15 derivative or analogue of such peptide fragment, as 
disclosed, and a test compound; and 

(b) determining binding between said substance and the 
test compound. 

A test compound found to bind to the relevant portion 
20 of XRCC4 may be tested for ability to modulate, e.g. disrupt 
or interfere with, XRCC4 interaction or binding with DNA 
ligase IV and/or ability to affect DNA ligase IV and/or 
XRCC4 activity or other activity mediated by XRCC4 or DNA 
ligase IV as discussed already above. 
25 Similarly, a test compound found to bind the relevant 

portion of DNA ligase IV may be tested for abiliy to 
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modulate, e.g. disrupt or interfere with, DMA ligase IV 
interaction or binding with XRCC4 and/or ability to affect 
XRCC4 and/or DNA ligase IV activity or other activity 
mediated by DNA ligase IV or XRCC4 as discussed already 
5 above. 

Another general aspect of the present invention 
provides an assay method for a substance able to affect DNA 
ligase IV activity, the method including: 

10 (a) bringing into contact DNA ligase IV and a test 

compound; and 

(b) determining DNA ligase IV activity. 
DNA ligase IV activity may be determined in the 
presence and absence of XRCC4 to allow for an effect of a 

15 test compound on activity to be attributed to an effect on 
interaction between DNA ligase IV and XRCC4 . 

DNA ligase IV activity may be conveniently determined 
by means of its adenylation. DNA ligase IV may be incubated 
with radiolabelled ATP (e.g. as described below), so that 

20 radiolabel is incorporated into the ligase. (The ligase 
goes through an enzyme-AMP adenylated intermediate.) Such 
radiolabel incorporation may be detected by various 
approaches, including for example scintillation proximetry 
assay Thus, radiolabelling of DNA ligase IV may be 

25 determined in the presence and absence of test compound and 
in the presence and absence of XRCC4 . Pre-adenylat ion of 
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DNA ligase IV with radiolabel allows for assaying for 
discharge of the radiolabel. 

Another activity of DNA ligase IV which may be 
determined is DNA ligase IV-mediated DNA strand joining. 
5 For instance, two DNA molecules may be provided each of 
which includes a site to which a PGR primer anneals under 
appropriate conditions. When the two DNA molecules are- 
covalently linked by DNA ligase IV to form a single DNA 
molecule, a PGR template results which can be amplified 

10 using the primers.. No PGR product results in the absence of 
ligation. The amount of PGR product obtained in a given 
reaction can be quantitated with respect to DNA ligase 
activity. Another option is to attach a DNA molecule to an 
insoluble support and to add another, labeJ led DNA molecule. 

15 Following addition of DNA ligase IV in the presence or 

absence of a test compound and a washing step, attachment of 
the second molecule to the support, which can only take 
place via ligation to the DNA molecule bound to the support, 
can be determined by means of the label and related to DNA 

20 ligase IV activity. 

A substance found to be able to modulate DNA ligase IV 
activity, e.g. in the presence or absence of XRGG4, may be 
employed in a similar assay using DNA ligase I and/or DNA 
ligase III, in order to assess specificity for DNA ligase 

25 IV. 
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Performance of an assay method according to the present 
invention may be followed by isolation and/or manufacture 
and/or use of a compound, substance or molecule which tests 
positive for ability to modulate interaction between XRCC4 
5 and DNA ligase IV and/or inhibit XRCC4 or DNA iigase IV 
activity or a mediated activity. 

The precise format of an assay of the invention may be 
varied by those of skill in the art using routine skill and 

10 knowledge. For example, interaction between substances may 
be studied in vitro by labelling one with a detectable label 
and bringing it into contact with the other which has been 
immobilised on a solid support. Suitable detectable labels, 
especially for petidyl substances include ^^S-methionine 

15 which may be incorporated into recombinantly produced 

peptides and polypeptides. Recombinantly produced peptides 
and polypeptides may also be expressed as a fusion protein 
containing an epitope which can be labelled with an 
antibody. 

20 The protein which is immobilized on a solid s-upport may 

be immobilized using an antibody against that protein bound 
to a solid support or via other technologies which are known 
per se. A preferred in vitro interaction may utilise a 
fusion protein including glutathione-S-transf erase (GST) . 

25 This may be immobilized on glutathione agarose beads. In an 
in vitro assay format of the type described above a test 



compound can be assayed by determining its ability to 
diminish the amount of labelled peptide or polypeptide which 
binds to the immobilized GST-fusion polypeptide. This may be 
determined by fractionating the glutathione-agarose beads by 
5 SDS-poiyacrylamide gel electrophoresis. Alternatively, the 
beads may be rinsed to remove unbound protein and the amount 
of protein which has bound can be determined by counting the 
amount of. label present in, for example, a suitable ■ 
scintillation counter. 

10 An assay according to the present invention may also 

take the form of an in vivo assay. The in vivo assay may be 
performed in a cell line such as a yeast strain or mammalian 
cell line in which the relevant polypeptides or peptides are 
expressed from one or more, v-ectors introduced into the cell. 

15 The ability of a test compound to modulate interaction 

or binding between XRCC4 and DMA ligase IV may be determined 
using a so-called two-hybid assay. 

For example, a polypeptide or peptide containing a 
fragment of XRCC4 or DNA ligase IV as the case may be, or a 

20 peptidyl analogue or variant thereof as disclosed, may be 
fused to a DNA binding domain such as that of the yeast 
transcription factor GAL 4. The GAL 4 transcription factor 
includes two functional domains. These domains are the DNA 
binding domain (GAL4DBD) and the GAL4 transcriptional 

25 activation domain (GAL4TAD) . By fusing one polypeptide or 
peptide to one of those domains and another polypeptide or 
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peptide to the respective counterpart, a functional GAL 4 
transcription factor is restored only when two polypeptides 
or peptides of interest interact. Thus, interaction of the 
polypeptides or peptides may be measured by the use of a 
5 reporter gene probably linked to a GAL 4 DNA binding site 
which is capable of activating transcription of said 
reporter gene. This assay format is described by Fields and 
Song, 1989, Nature 340 ; 245-246. This type of assay format 
can be used in both mammalian cells and in yeast. Other 

10 combinations of DNA binding domain and transcriptional 
activation domain are available in the art and may be 
preferred, such as the LexA DNA binding domain and the VP60 
transcriptional activation domain. 

To take a Lex/VP60 two hybrid screen by way of example 

15 for the purpose of illustration, yeast or mammalian cells 
may be transformed with a reporter gene construction which 
expresses a selective marker protein (e.g. encoding p- 
galactosidase or luciferase) . The promoter of that gene is 
designed such that it contains binding site for the LexA 

20 DNA-binding protein. Gene expression from that plasmid is 
usually very low. Two more expression vectors may be 
transformed into the yeast containing the selectable marker 
expression plasmid, one containing the coding sequence for 
the full length LexA gene linked to a multiple cloning site. 

25 This multiple cloning site is used to clone a gene of 

interest, i.e. encoding a XRCC4 or DNA ligase IV polypeptide 
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or peptide in accordance with the present invention, in 
frame on to the LexA coding region. The second expression 
vector then contains the activation domain of the herpes 
simplex transactivator VP16 fused to a test peptide sequence 
5 or more preferably a library of sequences encoding peptides 
with diverse e.g. random sequences. Those two plasmids 
facilitate expression from the reporter construct containing 
the selectable marker only when the LexA fusion construct 
interacts with a polypeptide or peptide sequence derived 

10 from the peptide library. 

A modification of this when looking for peptides or 
other substances which interfere with interaction between a 
XRCC4 polypeptide or peptide and DNA ligase IV polypeptide 
or peptide, employs the XRCC4 or DNA ligase IV polypeptide 

15 or peptide as a fusion with the LexA DNA binding domain, and 
the counterpart DNA ligase IV or XRCC4 polypeptide or 
peptide as a fusion with VP60, and involves a third 
expression cassette, which may be on a separate expression 
vector, from which a peptide or a library of peptides of 

20 diverse and/or random sequence may be expressed. A 

reduction in reporter gene expression (e.g. in the case of 
p-galactosidase a weakening of the blue colour) results from 
the presence of a peptide which disrupts the XRCC4/DNA 
ligase IV interaction, which interaction is required for 

25 transcriptional activation of the 3-galactosidase gene. 
Where a test substance is not peptidyl and may not be 
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expressed from encoding nucleic acid within a said third 
expression cassette, a similar system may be employed with 
the test substance supplied exogenously. 

As noted, instead of using LexA and VP60, other similar 
5 combinations of proteins which together form a functional 
transcriptional activator may be used, such as the GAL4 DNA 
binding domain and the GAL4 transcriptional actiyation • 
domain. 

When performing a two hybrid assay to look for 
10 substances which interfere with the interaction between two 
polypeptides or peptides it may be preferred to use 
mammalian cells instead of yeast cells. The same principles 
apply and appropriate methods are well known to those 
skilled in the art. 

15 

The amount of test substance or compound which may be 
added to an assay of the invention will normally be 
determined by trial and error depending upon the type of 
compound used. Typically, from about 0.01 nM to 100 \xM or 

20 more concentrations of putative inhibitor compound may be 
used, for example from 0.1 to 50 pM, such as about 10 pM. 
Greater concentrations may be used when a peptide is the 
test substance. Even a molecule with weak binding may be a 
useful lead compound for further investigation and 

25 development. 

Compounds which may be used may be natural or synthetic 
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chemical compounds used in drug screening programmes. 
Extracts of plants which contain several characterised or 
uncharacterised components may also be used. 

5 Antibodies directed to the site of interaction in 

either protein form a further class of putative inhibitor 
compounds. Candidate inhibitor antibodies may be 
characterised and their binding regions determined to 
provide single chain antibodies and fragments thereof which 
10 are responsible for disrupting the interaction. 

Antibodies may be obtained using techniques which are 
standard in the art. Methods of producing antibodies 
include immunising a mammal (e.g. mouse, rat, rabbit, horse, 
-goat, sheep or monkey) with the protein or a fragment 
15 thereof. Antibodies may be obtained from immunised animals 
using any of a variety of techniques known in the art, and 
screened, preferably using binding of antibody to antigen of 
interest. For instance. Western blotting techniques or 
immunoprecipitation may be used (Armitage et ai., 1992, 
20 Nature 357: 80-82). Isolation of antibodies and/or 

antibody-producing cells from an animal may be accompanied 
by a step of sacrificing the animal. 

As an alternative or supplement to immunising a mammal 
with a peptide, an antibody specific for a protein may be 
25 obtained from a recombinantly produced library of expressed 
immunoglobulin variable domains, e.g. using lambda 



bacteriophage or filamentous bacteriophage which display 
functional immunoglobulin binding domains on their surfaces; 
for instance see WO92/01047. The library may be naive, that 
is constructed from sequences obtained from an organism 
5 which has not been immunised with any of the proteins (or 
fragments)^ or may be one constructed using sequences 
obtained from an organism which has been exposed, to the- 
antigen of interest. 

Antibodies according to the present invention may be 

10 modified in a number of ways. Indeed the term "antibody" 

should be construed as covering any binding substance having 
a binding domain with the required specificity. Thus the 
invention covers antibody fragments, derivatives, functional 
equivalents and homologues of antibodies, including 

15 synthetic molecules and molecules whose shape mimicks that 
of an antibody enabling it to bind an antigen or epitope. 

Example antibody fragments, capable of binding an 
antigen or other binding partner are the Fab fragment 
consisting of the VL, VH, CI and CHI domains; the Fd 

2 0 fragment consisting of the VH and CHI domains; the Fv 

fragment consisting of the VL and VH domains of a single arm 
of an antibody; the dAb fragment which consists of a VH 
domain; isolated CDR regions and F{ab*)2 fragments, a 
bivalent fragment including two Fab fragments linked by a 

25 disulphide bridge at the hinge region. Single chain Fv 
fragments are also included. 



45 

A hybridoma producing a monoclonal antibody according 
to the present invention may be subject to genetic mutation 
or other changes. It will further be understood by those 
skilled in the art that a monoclonal antibody can be 
5 subjected to the techniques of recombinant DNA technology to 
produce other antibodies or chimeric molecules which retain 
the specificity of the original antibody. Such techniques 
may involve introducing DNA encoding the immunoglobulin 
variable region, or the complementarity determining regions 

10 (CDRs), of an antibody to the constant regions, or constant 
regions plus framework regions, of a different 
immunoglobulin. See, for instance, EP184187A, GB 2188638A 
or EP-A-0239400 . Cloning and expression of chimeric 
antibodies are described in EP-A-0120694 and EP-A-0125023 . 

15 Hybridomas capable of producing antibody with desired 

binding characteristics are within the scope of the present 
invention, as are host cells, eukaryotic or prokaryotic, 
containing nucleic acid encoding antibodies (including 
antibody fragments) and capable of their expression. The 

20 invention also provides methods of production of the 

antibodies including growing a cell capable of producing the 
antibody under conditions in which the antibody is produced, 
and preferably secreted. 

The reactivities of antibodies on a sample may be 

25 determined by any appropriate means. Tagging with 

individual reporter molecules is one possibility. The 
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reporter molecules may directly or indirectly generate 
detectable, and preferably measurable, signals. The linkage 
of reporter molecules may be directly or indirectly, 
covalently, e.g. via a peptide bond or non-covalently . 
5 Linkage via a peptide bond may be as a result of recombinant 
expression of a gene fusion encoding antibody and reporter 
molecule. 

One favoured mode is by covalent linkage of each 
antibody with an individual f luorochrome, phosphor or. laser 

10 dye with spectrally isolated absorption or emission 
characteristics. Suitable f luorochromes include 
fluorescein, rhodamine, phycoerythrin and Texas Red. 
Suitable chromogenic dyes include diaminobenzidine . 

Other reporters include macromolecular colloidal 

15 particles or particulate material such as latex beads that 
are coloured, magnetic or paramagnetic, and biologically or 
chemically active agents that can directly or indirectly 
cause detectable signals to be visually observed, 
electronically detected or otherwise recorded. These 

20 molecules may be enzymes which catalyse reactions that 
develop or change colours or cause changes in electrical 
properties, for example. They may be molecularly excitable, 
such that electronic transitions between energy states 
result in characteristic spectral absorptions or emissions. 

25 They may include chemical entities used in conjunction with 
biosensors. Biotin/avidin or biotin/streptavidin and 
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alkaline phosphatase detection systems may be employed. 

The mode of determining binding is not a feature of the 
present invention and those skilled in the art are able to 
choose a suitable mode according to their preference and 
5 general knowledge. 

Antibodies may also be used in purifying and/or 
isolating a polypeptide or peptide according to the present 
invention, for instance following production of the 
polypeptide or peptide by expression from encoding nucleic 

10 acid therefor. Antibodies may be useful in a therapeutic 

context (which may include prophylaxis) to disrupt XRCC4/DNA 
ligase IV interaction with a view to inhibiting, their 
activity. Antibodies can for instance be micro-injected 
into cells, e.g. at a tumour site. Antibodies may be 

15 employed in accordance with the present invention for other 
therapeutic and non-therapeutic purposes which are discussed 
elsewhere herein. 

Other candidate inhibitor compounds may be based on 
20 modelling the 3-dimensional structure of a polypeptide or 
peptide fragment and using rational drug design to provide 
potential inhibitor compounds with particular molecular 
shape, size and charge characteristics, 

25 A compound found to have the ability to affect XRCC4 

and/or DNA ligase IV activity has therapeutic and other 
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potential in a number of contexts, as discussed. For 
therapeutic treatment such a compound may be used in 
combination with any other active substance, e.g. for anti- 
tumour therapy another anti-tumour compound or therapy, such 
5 as radiotherapy or chemotherapy. In such a case, the assay 
of the invention, when conducted in vivo, need not measure 
the degree of inhibition of binding or of modulation of- DNA 
ligase IV activity caused by the compound being tested. 
Instead the effect on DNA repair, homologous recombination, 

10 cell viability, cell killing (e.g. in the presence and 
absence of radio- and/or chemo-therapy) , retroviral 
integration, and so on, may be measured. It may be that 
such a modified assay is run in parallel with or subsequent 
to the main assay of the invention in order to confirm that 

15 any such effect is as a result of the inhibition of binding 
or' interaction between XRCC4 and DNA ligase IV caused by 
said inhibitor compound and not merely a general toxic 
effect , 

Thus, an agent identified using one or more primary 
20 screens (e.g. in a cell-free system) as having ability to 
bind XRCC4 and/or DNA ligase IV and/or modulate activity of 
XRCC4 and/or DNA ligase IV may be assessed further using one 
or more secondary screens. A secondary screen may involve 
testing for cellular radiosensitisation and/or sensitisation 
25 to radiomimetic drugs, testing for impairment of V(D)J 

recombination in a transfection assay, and/or testing, for 
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ability to potentiate homologous recombination-mediated gene 
targeting. This may be tested directly and/or using a 
transfection assay that gives a read-out only after 
homologous recombination has occurred, (e.g. involving co- 
5 transformation or co-transf ection of a cellular system with 
two plasmids that must undergo homologous recombination to 
yield an active reporter gene (such as luciferase or green 
fluorescent protein) , or homologous integration of 
tranfected DNA into the genome. 

10 

Following identification of a substance or agent which- 
modulates or affects XRCC4 and/or DNA ligase IV activity, 
the substance or agent may be investigated further. 
Furthermore, it may be manufactured and/or used in 
15 preparation, i.e. manufacture or formulation, of a 

composition such as a medicament, pharmaceutical composition 
or drug. These may be administered to individuals, e.g. for 
any of the purposes discussed elsewhere herein. 

20 As noted, the agent may be peptidyl, e.g* a peptide 

which includes a sequence as recited above, or may be a- 

functional analogue of such a peptide. 

As used herein, the expression "functional analogue" 

relates to peptide variants or organic compounds having the 
25 same functional activity as the peptide in question, which 

may interfere with the binding between XRCC4 and DNA ligase 
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IV. Examples of such analogues include chemical compounds 
which are modelled to resemble the three dimensional 
structure of the XRCC4 or DNA ligase IV domain in the 
contact area, and in particular the arrangement of the key 
5 amino acid residues as they appear in XRCC4 or DNA ligase 
IV. 

In a further aspect, the present invention provides the 
use of the above substances in methods of designing or 
screening for mimetics of the substances. 

10 Accordingly, the present invention provides a method of 

designing mimetics of XRCC4 or DNA ligase IV having the 
biological activity of DNA ligase IV or XRCC4 binding or 
inhibition, the activity of allosteric inhibition of DNA 
ligase IV or XRCC4 and/or the activity of modulating, e.g. 

15 inhibiting, XRCC4/DNA ligase IV interaction, said method 
comprising : 

(i) analysing a substance having the biological 
activity to determine the amino acid residues essential and 
important for the activity to define a pharmacophore; and, 

20 (ii) modelling the pharmacophore to design and/or 

screen candidate mimetics having the biological activity. 

Suitable modelling techniques are known in the art. 
This includes the design of so-called "mimetics" which 
involves the study of the functional interactions 

25 fluorogenic oligonucleotide the molecules and the design of 
compounds which contain functional groups arranged in such a 
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manner that they could reproduced those interactions. 

The designing of mimetics to a known pharmaceutically 
active compound is a known approach to the development of 
5 pharmaceuticals based on a "lead" compound. This might be 
desirable where the active compound is difficult or 
expensive to synthesise or where it is unsuitably for a- 
particular method of administration, e.g. peptides are not 
well suited as active agents for oral compositions as they 
10 tend to be quickly degraded by proteases in the alimentary 
canal. ■ Mimetic design, synthesis and testing may be used to 
avoid randomly screening large number of molecules for a 
target property. 

There are several steps commonly taken in the design of 
15 a mimetic from a compound having a given target property. 
Firstly, the particular parts of the compound that are 
critical and/or important in determining the target property 
are determined. In the case of a peptide, this can be done 
by systematically varying the amino acid residues in the 
20 peptide, e.g. by substituting each residue in turn. These 
parts or residues constituting the active region of the 
compound are known as its "pharmacophore". 

Once the pharmacophore has been found, its structure is 
modelled to according its physical properties, e.g. 
25 stereochemistry, bonding, size and/or charge, using data 
from a range of sources, e.g. spectroscopic techniques, X- 
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ray diffraction data and NMR. Computational analysis, 
similarity mapping {which models the charge and/or volume of 
a pharmacophore, rather than the bonding between atoms) and 
other techniques can be used in this modelling process. 
5 In a variant of this approach, the three-dimensional 

structure of the ligand and its binding partner are 
modelled. This can be especially useful where the ligand 
and/or binding partner change conformation on binding, 
allowing the model to take account of this the design of the 
10 mimetic. 

A template molecule is then selected onto which 
chemical groups which mimic the pharmacophore can be 
grafted. The template molecule and the chemical groups 
grafted on to it can conveniently be selected so that the 

15 mimetic is easy to synthesise, is likely to be 

pharmacologically acceptable, and does not degrade in vivo, 
while retaining the biological activity of the lead 
compound. The mimetic or mimetics found by this approach 
can then be screened to see whether they have the target 

20 property, or to what extent they exhibit it. Further 
optimisation or modification can then be carried out to 
arrive at one or more final mimetics for in vivo or clinical 
testing . 

The mimetic or mimetics found by this approach can then 
25 be screened to see whether they have the target property, or 
to what extent they exhibit it. Further optimisation or 
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modification can then be carried out to arrive at one or 
more final mimetics for in vivo or clinical testing, 

Mimetics of this type together with their use in 
therapy form a further aspect of the invention. 

5 

The present invention further provides the use of a 
peptide which includes a- sequence as disclosed, or a 
derivative, active portion, analogue, variant or mimetic, 
thereof able to bind XRCC4 or DNA ligase IV and/or modulate, 

10 e.g. inhibit, interaction between XRCC4 and DNA ligase IV 
and/or modulate, ' e; g. inhibit, XRCC4 and/or DNA ligase IV 
activity, in screening for a substance- able to bind DNA 
ligase IV and/or XRCC4, and/or modulate, e.g. inhibit, 
interaction between XRCC4 and DNA ligase IV, and/or inhibit 

15 XRCC4 and/or DNA ligase IV activity. 

Generally, such a substance, e.g. inhibitor, according 
to the present invention is provided in an isolated and/or 
purified form, i.e. substantially pure. This may include 
being in a composition where it represents at least about 

20 90% active ingredient, more preferably at least about 95%, 
more preferably at least about 98%. Such a composition may, 
however, include inert carrier materials or other 
pharmaceutically and physiologicaly acceptable excipients. 
As noted below, a composition according to the present 

25 invention may include in addition to an inhibitor compound 
as disclosed, one or more other molecules of therapeutic 



use, such as an anti-tumour agent. 

The present invention extends in various aspects not 
only to a substance identified as a modulator of XRCC4 and 
5 DNA ligase IV interaction and/or XRCC4 or DNA ligase IV- 
mediated activity, property or pathway, in accordance with 
what is disclosed herein, but also a pharmaceutical 
composition, medicament, drug or other composition 
comprising such a substance, a method comprising 
10 administration of such a composition to a patient, e.g. for 
a purpose discussed elsewhere herein, which may include 
preventative treatment, use of such a substance in 
manufacture of a composition for administration, e.g. for a 
purpose discussed elsewhere herein, and a method of making a 
15 pharmaceutical composition comprising admixing such a 
substance with a pharmaceutically acceptable excipient, 
vehicle or carrier, and optionally other ingredients. 

A substance according to the present invention such as 
an inhibitor of XRCC4 and DNA ligase IV interaction or 
20 binding may be provided for use in a method of treatment of 
the human or animal body by therapy which affects DNA repair 
or other XRCC4 or DNA ligase IV-mediated activity in cells, 
e.g. tumour cells. Other purposes of a method of treatment 
employing a substance in accordance with the present 
25 invention are dicussed elsewhere herein. 

Thus the invention further provides a method of 
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modulating DNA repair activity, particularly DSB end- 
joining, or other XRCC4 and/or DNA ligase IV-mediated 
activity, e.g. for a purpose discussed elsewhere herein, 
which includes administering an agent which modulates, 
5 inhibits or blocks the binding of XRCC4 to DNA ligase IV 
protein, such a method being useful in treatment where such 
modulation, inhibition or blocking is desirable., 

The invention further provides a method of treatment 
which includes administering to a patient an agent which 
10 interferes with the binding of XRCC4 to UNA ligase iv. 

Exemplary purposes of such treatment are discussed elsewhere 
herein . 

Whether it is a polypeptide, antibody, peptide, nucleic 
15 acid molecule, small molecule, mimetic or other 

pharmaceutically useful compound according to the present 
invention that is to be given to an individual, 
administration is preferably in a "prophylactically 
effective amount" or a "therapeutically effective amount" 
20 (as the case may be, although prophylaxis may be considered 
therapy) , this being sufficient to show benefit to the 
individual. The actual amount administered, and rate and 
time-course of administration, will depend on the nature and 
severity of what is being treated. Prescription of 
25 treatment, e.g. decisions on dosage etc, is within the 
responsibility of general practioners and other medical 



doctors . 

A composition may be administered alone or in 
combination with other treatments, either simultaneously or 
sequentially dependent upon the condition to be treated, 

5 Pharmaceutical compositions according to the present 

invention, and for use in accordance with the present 
invention, may include, in addition to active ingredient, a 
pharmaceutically acceptable excipient, carrier, buffer, 
stabiliser or other materials well known to those skilled in 

0 the art- Such materials should be non-toxic and should not 
interfere with the efficacy of the active ingredient. The 
precise nature of the carrier or other material will depend 
on the route of administration, which may be oral, or by 
injection, e.g. cutaneous, subcutaneous or intravenous. 

5 Pharmaceutical compositions for oral administration may 

be in tablet, capsule, powder or liquid form. A tablet may 
include a solid carrier such as gelatin or an adjuvant. 
Liquid pharmaceutical compositions generally include a 
liquid carrier such as water, petroleum, animal or vegetable 

0 oils, mineral oil or synthetic oil. Physiological saline 
solution, dextrose or other saccharide solution or glycols 
such as ethylene glycol, propylene glycol or polyethylene 
glycol may be included. 

For intravenous, cutaneous or subcutaneous injection, 

5 or injection at the site of affliction, the active 

ingredient will be in the form of a parenterally acceptable 
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aqueous solution which is pyrogen-free and has suitable pH, 
isotonicity and stability. Those of relevant skill in the 
art are well able to prepare suitable solutions using, for 
example, isotonic vehicles such as Sodium Chloride 
5 Injection, Ringer's Injection, Lactated Ringer's Injection. 
Preservatives, stabilisers, buffers, antioxidants and/or 
other additives may be included, as required. 

Liposomes, particularly cationic liposomes, may be used 
in carrier formulations. 
10 Examples of techniques and protocols mentioned above 

can be found in Remington's Pharmaceutical Sciences, 16th _ 
edition, Osol, A. (ed) , 1980. 

The agent may be administered in a localised manner to 
a tumour site or other desired site or may be delivered in a 
15 manner in which it targets tumour or other cells. 

Targeting therapies may be used to deliver the active 
agent more specifically to certain types of cell, by the use 
of targeting systems such as antibody or cell specific 
ligands. Targeting may be desirable for a variety of 
20 reasons, for example if the agent is unacceptably toxic, or 
if it would otherwise require too high a dosage, or if it 
would not otherwise be able to enter the target cells. 

Instead of administering these agents directly, they 
may be produced in the target cells by expression from an 
25 encoding gene introduced into the cells, eg in a viral 

vector (a variant of the VDEPT technique - see below) . The 
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vector may targeted to the specific cells to be treated, or 
it may contain regulatory elements which are switched on 
more or less selectively by the target cells. 

The agent (e.g. small molecule, mimetic) may be 
5 administered in a precursor form, for conversion to the 
active form by an activating agent produced in, or targeted 
to, the cells to be treated. This type of approach is ■ 
sometimes known as ADEPT or VDEPT, the former involving 
targeting the activator to the cells by conjugation to a 
10 cell-specific antibody, while the latter involves producing 
the activator, e.g. an enzyme, in a vector by expression 
from encoding DNA in a viral vector (see for example, EP-A- 
415731 and WO 90/07936). 

An agent may be administered in a form which is 
15 inactive but which is converted to an active form in the 
body. For instance, the agent may be phosphorylated (e.g. 
to improve solubility) with the phosphate being cleaved to 
provide an active form of the agent in the body. 

A composition may be administered alone or in 
20 combination with other treatments, either simultaneously or 
sequentially dependent upon the condition to be treated, 
such as cancer, virus infection or any other condition in 
which a XRCC4 or DNA ligase IV-mediated effect is desirable. 
Nucleic acid according to the present invention, 
25 encoding a polypeptide or peptide able to modulate, e.g. 
interfere with, XRCC4 and DNA ligase IV interaction or 



binding and/or induce or modulate activity or other XRCC4 o 
DNA ligase IV-mediated cellular pathway or function, may be 
used in methods of gene therapy, for instance in treatment 
of individuals, e.g. with the aim of preventing or curing 
5 (wholly or partially) a disorder or for another purpose as 
discussed elsewhere herein. 

Vectors such as viral vectors have been use^i in the 
prior art to introduce nucleic acid into a wide variety of 
different target cells. Typically the vectors are exposed 

10 to the target cells so that transfection can take place in 
sufficient proportion of the cells to provide a useful 
therapeutic or" prophylactic effect from the expression of 
the desired polypeptide. The transfected nucleic acid may 
be permanently incorporated into the genome? of each of the 

15 targeted cells, providing long lasting effect, or 
alternatively the treatment may have to be repeated 
periodically. 

A variety of vectors, both viral vectors and plasmid 
vectors, are known in the art, see US Patent No. 5,252,479 

20 and WO 93/07282. In particular, a number of viruses have 
been used as gene transfer vectors, including papovaviruses 
such as SV40, vaccinia virus, herpesviruses, including HSV 
and EBV, and retroviruses. Many gene therapy protocols in 
the prior art have used disabled murine retroviruses. 

25 As an alternative to the use of viral vectors other 

known methods of introducing nucleic acid into cells 
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includes electroporation, calcium phosphate co- 
precipitation, mechanical techniques such as microinjection, 
transfer mediated by liposomes and direct DNA uptake and 
receptor-mediated DNA transfer. 
5 Receptor-mediated gene transfer, in which the nucleic 

acid is linked to a protein ligand via polylysine, with the 
ligand being specific for a receptor present on the surface 
of the target cells, is an example of a technique for 
specifically targeting nucleic acid to particular cells. 

10 

A polypeptide, peptide or other substance able to 
mediate or interfere with the interaction of the relevant 
polypeptide, peptide or other substance as disclosed herein, 
or a nucleic acid molecule encoding a peptidyl such 
15 molecule, may be provided in a kit, e.g. sealed in a 

suitable container which protects its contents from the 
external environment. Such a kit may include instructions 
for use. 

20 Further aspects of the present invention arise from the 

fact that the work described herein provides indication that 
mammals including humans deficient in XRCC4 and/or DNA 
ligase IV will have immune deficiencies, heightened cancer 
predisposition, particularly lyphoreticular malignancies, 

25 and/or will be radiosensitive. 

For example, a small but significant percentage of 
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human patients have disastrously debilitating (sometimes 
fatal) reactions to standard clinical doses of radiation. 
This is unfortunate, particularly where alternative modes of 
(e.g.) cancer treatment are available. The present 
5 invention allows and provides for diagnosis of such 
radiosensitive patients. 

Diagnosis of XRCC4 and/or DNA ligase IV deficiency., 
which may be reduced ability of the particular polypeptide 
of an individual to interact with the other, or another 
10 coinponent of a DNA repair pathway / may be used in 

conjunction with similar analysis of activity, function or 
structural integrity of other components of DNA repair 
pathways, such as Ku70, Ku80, DNA-PKcs, etc. 

15 A number of methods are known in the art for analysing 

biological samples from individuals to determine whether the 
individual carries an allele of a particular gene 
predisposing them to a particular disorder. The purpose of 
such analysis may be used for diagnosis or prognosis, and 

20 serve to detect the presence of an existing defect (e.g. 

radiosensitivity) , to help identify the type of defect (e.g. 
a factor in a manifest clinical disorder, such as cancer) , 
to assist a physician in determining the severity or likely 
course of a disorder and/or to optimise treatment of it. 

25 Alternatively, the methods can be used to detect alleles 

that are statistically associated with a susceptibility to a 
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disorder in the future, e.g. cancer, identifying individuals 
who would benefit from regular screening to provide early 
diagnosis of the disorder, e.g. cancer. 

5 For instance, oligonucleotides designed to hybridise to 

a region within the gene of interest may be used in 
diagnostic and prognostic screening. 

Oligonucleotide probes or primers, as well as the full- 
length gene sequence (and mutants, alleles, variants and 

10 derivatives) are useful in screening a test sample 

containing nucleic acid for the presence of alleles, mutants 
and variants, especially those that confer susceptibility or 
predisposition to a particular disorder, including 
radiosensitivity and cancers, the probes hybridising with a 

15 target sequence from a sample obtained from the individual 
being tested. The conditions of the hybridisation can be 
controlled to minimise non-specific binding, and preferably 
stringent to moderately stringent hybridisation conditions 
are preferred. The skilled person is readily able to design 

20 such probes, label them and devise suitable conditions for 
the hybridisation reactions, assisted by textbooks such as 
Sambrook et al (1989) and Ausubel et al (1992) , 

Nucleic acid isolated and/or purified from one or more 
cells (e.g. human) or a nucleic acid library derived from 

25 nucleic acid isolated and/or purified from cells (e.g. a 

cDNA library derived from mRNA isolated from the cells) , may 
be probed under conditions for selective hybridisation 
and/or subjected to a specific nucleic acid amplification 



63 

reaction such as the polymerase chain reaction (PGR) . 

A method may include hybridisation of one or more (e.g. 
two) probes or primers to target nucleic acid. Where the 
nucleic acid is double- stranded DNA, hybridisation will 
5 generally be preceded by denaturation to produce single - 
stranded DNA. The hybridisation may be as part of a PGR 
procedure, or as part of : a probing procedure not involving 
PGR. An example procedure would be a combination of PGR and 
low stringency hybridisation. A screening procedure, chosen 
10 from the many available to those skilled in the art, is used 
to identify successful hybridisation events and isolated 
hybridised nucleic acid. - .- 

Binding of a probe to target nucleic acid (e.g. DNA) 
may be measured using any of a: variety of techniques at the 
15 disposal of those skilled in -the art .. For instance, probes 
may be radioactively , f luorescently or enzymatically 
labelled. Other methods not employing labelling of probe 
include examination of restriction fragment length 
polymorphisms, amplification using PGR, RNAase cleavage and 
20 allele specific oligonucleotide probing. 

Probing may employ the standard Southern blotting 
technique . For instance DNA may be extracted from cells and 
digested with different restriction enzymes. Restriction 
fragments may then be separated by electrophoresis on an 
25 agarose gel, before denaturation and transfer to a 

nitrocellulose filter. Labelled probe may be hybridised to 
the DNA fragments on the filter and binding determined. DNA 
for probing may be prepared from RNA preparations from 
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cells . 

Those skilled in the art are well able to employ 
suitable conditions of the desired stringency for selective 
hybridisation, taking into account factors such as 
5 oligonucleotide length and base composition, temperature and 
so on . 

PGR techniques for the amplification of nucleic acid 
are described in US Patent No. 4,683,195. In general, such 
techniques require that sequence information from the ends 

10 of the target sequence is known to allow suitable forward 
and reverse oligonucleotide primers to be designed to be 
identical or similar to the polynucleotide sequence that is 
the target for the amplification. PGR comprises steps of 
denaturation of template nucleic acid (if double -stranded) , 

15 annealing of primer to target, and polymerisation. The 

nucleic acid probed or used as template in the amplification 
reaction may be genomic DNA, cDNA or RNA. PGR can be used to 
amplify specific sequences from genomic DNA, specific RNA 
sequences and cDNA transcribed from rtiRNA, bacteriophage or 

2 0 plasmid sequences. References for the general use of PGR 
techniques include Mullis et al. Cold Spring Harbor Symp. 
Quant. Biol., 51:263, (1987), Ehrlich (ed) , PGR technology, 
Stockton Press, NY, 1989, Ehrlich et al. Science, 252:1643- 
1650, (1991), "PGR protocols; A Guide to Methods and 

25 Applications", Eds. Innis et al. Academic Press, New York, 
(1990) . 

On the basis of amino acid sequence information, 
oligonucleotide probes or primers may be designed, taking 
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into account the degeneracy of the genetic code, and, where 
appropriate, codon usage of the organism from the candidate 
nucleic acid is derived. An oligonucleotide for use in 
nucleic acid amplification may have about 10 or fewer codons 
5 (e.g. 6, 7 or 8), i.e. be about 30 or fewer nucleotides in 
length (e.g. 18, 21 or 24). Generally specific primers are 
upwards of 14 nucleotides in length, but not more than 18- 
20. Those skilled in the art are well versed in the design 
of primers for use processes such as PGR. . 

A further aspect of the present invention provides an 
oligonucleotide or polynucleotide fragment of XRCC4 . or. DNA 
ligase IV corresponding to part of the gene coding sequence^ . 
or a complementary sequence, in particular for use in a 

15 method of obtaining and/or screening nucleic acid. .The 
sequences referred to above may be modified by addition, 
substitution, insertion or deletion of one or more 
nucleotides, but preferably without abolition of ability to 
hybridise selectively with the relevant gene sequence, that 

20 is wherein the degree of homology of the oligonucleotide or 
polynucleotide with the secfuence given is sufficiently high. 

In some preferred embodiments, oligonucleotides 
according to the present invention that are fragments of the 
relevant gene sequence, in wild- type form or in the form of 

25 any allele associated with susceptibility to cancer or other 
disorder, are at least about 10 nucleotides in length, more 
preferably at least about 15 nucleotides in length, more 
preferably at least about 20 nucleotides in length. Such 
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fragments themselves individually represent aspects of the 
present invention. Fragments and other oligonucleotides may- 
be used as primers or probes as discussed but may also be 
generated (e.g. by PGR) in methods concerned with 
5 determining the presence in a test sample of a sequence 
indicative of susceptibility to cancer or other disorder. 
Preferred probes or primers according to certain 
embodiments of this aspect of the present invention are 
designed to hybridise with and/or amplify a fragment of the 
10 relevant sequence {e.g. XRCC4 or DNA ligase IV) including 
any residue mutation at which is associated with cancer 
susceptibility . 

A number of methods are known in the art for analysing 
15 biological samples from individuals to determine whether the - 
individual carries a gene allele with a mutation 
predisposing them to disease. The purpose of such analysis 
may be used for diagnosis or prognosis, and serve to detect 
the presence of, e.g., an existing cancer, to help identify 
2 0 the type of cancer, to assist a physician in determining the 
severity or likely course of the cancer and/or to optimise 
treatment of it . The methods may be used to detect alleles 
that are statistically associated with a susceptibility to 
cancer or other disorder in the future, e.g. early onset 
2 5 cancer, identifying individuals who would benefit from 
regular screening to provide early diagnosis of the 
disorder . 

Broadly, the methods divide into those screening for 
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the presence of nucleic acid sequences and those that rely 
on detecting the presence or absence of polypeptide- The 
methods make use of biological samples from individuals that 
are suspected of contain the nucleic acid sequences or 
5 polypeptide- Examples of biological samples include blood, 
plasma, ' serum, tissue samples, tumour samples, saliva and 
urine. 

Exemplary approaches for detecting nucleic acid or 
10 polypeptides include: 

(a) comparing the sequence of nucleic acid in the 
sample with a XRCC4 and/or DNA ligase IV nucleic acid 
sequence to determine whether the sample from the patient 
contains one or more mutations, e.g. in a particular region, 

15 such as a region which interacts with counterpart DNA ligase 
IV or XRCC4 , as the case may be, or other component of a DNA 
repair pathway, or, particularly in the case of DNA ligase 
IV, a catalytic region, or, 

(b) determining the presence in a sample of a XRCC4 
20 and/or DNA ligase polypeptide encoded by and, if present, 

determining whether the polypeptide includes a region 
corresponding to wild- type, and/or is mutated in such a 
region; or, 

(c) using DNA fingerprinting to compare the restriction 
2 5 pattern produced when a restriction enzyme cuts a sample of 

nucleic acid from the patient with the restriction pattern 
obtained from a particular region corresponding to that for 
the normal gene or from known mutations thereof; or. 
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(d) using a specific binding member capable of binding 
to a nucleic acid sequence (either a normal sequence or a 
known mutated sequence) encoding a particular polypeptide 
fragment, the specific binding member comprising nucleic 

5 acid hybridisable with the relevant sequence, or substances 
comprising an antibody domain with specificity for a native 
or mutated polypeptide fragment nucleic acid sequence or the 
polypeptide encoded by it, the specific binding ' member being 
labelled so that -binding of the specific binding member to 
10 its binding partner is detectable; or, 

(e) using PGR involving one or more primers based on 
the relevant normal or mutated gene sequence to screen for 
normal or mutant sequences within a particular region of the 
gene in a sample . 

15 A "specific binding pair" in such a context may 

comprise a specific binding member (sbra) and a binding 
partner (bp) which have a particular specificity for each 
other and which in normal conditions bind to each other in 
preference to other molecules. Examples of specific binding 

20 pairs are antigens and antibodies (see above) , molecules and 
receptors and complementary nucleotide sequences. The 
skilled person will be able to think of many other examples 
and they do not need to be listed here. Further, the term 
"specific binding pair" is also applicable where either or 

25 both of the specific binding member and the binding partner 
comprise a part of a larger molecule. In embodiments in 
which the specific binding pair are nucleic acid sequences, 
they will be of a length to hybridise to each other under 
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the conditions of the assay, preferably greater than 10 
nucleotides long, more preferably greater than 15 or 20 
nucleotides long. 

In most embodiments for screening for susceptibility 
5 alleles, the relevant nucleic acid (e.g. encoding XRCC4 
and/or DNA ligase IV) in the sample will initially be 
amplified, e:g. using PGR, to increase the amount of the 
analyte as compared to other sequences present in the 
sample ; This allows the target sequences to be detected 
10 with^ a high- degree of sensitivity if they are present in the 
sample. This initial step may be avoided by using highly 
sensitive array techniques that are becoming increasingly 
important in the art. ■ 

15 To reiterate in" further detail, the identification of 

biochemical activity and physiological function of XRCC4 and 
DNA ligase IV and particular regions thereof paves the way 
for aspects of the present invention to provide the use of 
materials and methods, such as are disclosed and discussed 

2 0 above, for establishing the presence or absence in a test 
sample of an variant form of the gene, in particular an 
allele or variant specifically associated with cancer or 
other disorder such as radiosensitivity , as discussed: This 
may be for diagnosing a predisposition of an individual to a 

2 5 disorder. It may be for diagnosing a patient with a 
disorder as being associated with the gene. 

This allows for planning of appropriate therapeutic 
and/or prophylactic treatment , permitting stream- lining of 
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treatment by targeting those most likely to benefit. 

A variant form of the gene may contain one or more 
insertions, deletions, substitutions and/or additions of one 
or more nucleotides compared with the wild- type sequence 
5 which may or may not disrupt the transcriptional activation 
function of the region examined herein. Differences at the 
nucleic acid level are not necessarily reflected by a 
difference in the amino acid sequence of the encoded 
polypeptide. However, a mutation or other difference in a 

10 gene may result in a frame-shift or stop codon, which could 
seriously affect the nature of the polypeptide produced, or 
a point mutation or gross mutational change to the encoded 
polypeptide, including insertion, deletion, substitution 
and/or addition of one or more amino acids or regions in the 

15 polypeptide, which ^ay affect transcriptional activation. 

There are various methods for determining the presence 
or absence in a test sample of a particular nucleic acid 
sequence, such as a sequence for XRCC4 or DNA ligase IV, or 

2 0 a fragment, mutant, variant or allele thereof. 

Tests may be carried out on preparations containing 
genomic DNA, cDNA and/or mRNA. Testing cDNA or mRNA has the 
advantage of the complexity of the nucleic acid being 
reduced by the absence of intron sequences, but the possible 

2 5 disadvantage of extra time and effort being required in 
making the preparations. RNA is more difficult to 
manipulate than DNA because of the wide-spread occurrence of 
RN' ases . 



Nucleic acid in a test sample may be sequenced and the 
sequence compared with the relevant wild-tpye sequence to 
determine whether or not a difference is present. If so, 
the difference can be compared with known susceptibility 
5 alleles, to determine whether the test nucleic acid contains 
one or more of the variations indicated, or the difference 
can be investigated for- association with the disorder of 
interest. 

Since it will not generally be time- or labour- 

10 efficient to sequence all- nucleic, acid in a test sample or 
even the whole gene for XRCC4 or DNA ligase IV, a specific 
amplification reaction such as PGR using one or more pairs . 
of primers may be employed to amplify the region- of. interest 
in the nucleic aci. The, amplified nucleic acid may then be 

15 sequenced as above, and/or tested in any/other way to 

determine the presence or absence of a particular feature. 
Nucleic acid for testing may be prepared from, nucleic acid 
removed from cells or in a library using a variety of other 
techniques such as restriction enzyme digest and 

20 electrophoresis. 

Nucleic acid may be screened using a variant- or 
allele-specif ic probe. Such a probe corresponds in sequence 
to a region of the relevant gene, or its complement, 
containing a sequence alteration known to be associated with 

25 susceptibility to cancer or other disorder of interest. 

Under suitably stringent conditions, specific hybridisation 
of such a probe to test nucleic acid is indicative of the 
presence of the sequence alteration in the test nucleic 
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acid. For efficient screening purposes, more than one probe 
may be used on the same test sample. 

Allele- or variant- specif ic oligonucleotides may 
similarly be used in PGR to specifically amplify particular 
5 sequences if present in a test sample. Assessment of whether 
a PGR band contains a gene variant may be carried out in a 
number of ways familiar to those skilled in the art. The 
PGR product may for instance be treated in a way that 
enables one to display the mutation or polymorphism on a 

10 denaturing polyacrylamide DNA sequencing gel, with specific 
bands that are linked to the gene variants being selected. 

SSCP heteroduplex analysis may be used for screening 
DNA fragments for sequence variants/mutations. It generally 
involves amplifying radiolabelled 100-3 00 bp fragments of 

15 the gene, di.n.urting these products and denaturing at 95 °C. 
The fragments are quick- cooled on ice so that the DNA 
remains in single stranded form. These single stranded 
fragments are run through acrylamide based gels. 
Differences in the sequence composition will cause the 

2 0 single stranded molecules to adopt difference conformations 
in this gel matrix making their mobility different from wild 
type fragments, thus allowing detecting of mutations in the 
fragments being analysed relative to a control fragment upon 
exposure of the gel to X-ray film. 

2 5 Fragments with altered mobility/conformations may be 

directly excised from the gel and directly sequenced for 
mutation . 

Sequencing of a PGR product may involve precipitation 
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with isopropanol, resuspension and sequencing using a TaqFS+ 
Dye terminator sequencing kit . Extension products may be 
electrophoresed on an ABI 3 77 DNA sequencer and data 
analysed using Sequence Navigator software. 
5 A further possible screening approach employs a PTT 

assay in which fragments are amplified with primers that 
contain the consensus Kozak initiation sequences and a T7 
RNA polymerase promoter. These extra sequences' are 
incorporated into the 5' primer such that they are in frame 

10 with the native coding sequence of the fragment being 

analysed. These PCR products are introduced into a coupled 
.transcription/translation system. This reaction allows the - 
production of RNA from the fragment and translation of this 
RNA into a protein fragment. pCR, products from controls 

15 make a protein product of a wild type size relative to the 
size of the fragment being analysed. If the PCR product 
analysed has a frame -shift or nonsense mutation, the assay 
will yield a truncated protein product relative to controls. 
The size of the truncated product is related to the position 

20 of the mutation, and the relative region of the gene from 
this patient may be sequenced to identify the truncating 
mutation . 

An alternative or supplement to looking for the 
presence of variant sequences in a test sample is to look 
2 5 for the presence of the normal sequence, e.g. using a 
suitably specific oligonucleotide probe or primer. 

Approaches which rely on hybridisation between a probe 
and test nucleic acid and subsequent detection of a mismatch 



may be employed. Under appropriate conditions (temperature, 
pH etc.), an oligonucleotide probe will hybridise with a 
sequence which is not entirely complementary. The degree of 
base-pairing between the two molecules will be sufficient 
5 for them to anneal despite a mis -match- Various approaches 
are well known in the art for detecting the presence of a 
mis-match between two annealing nucleic acid molecules. 

For instance, RN'ase A cleaves at the site' of a mis- 
match. Cleavage ,can be detected by electrophoresing test 

10 nucleic acid to which the relevant probe or probe has 

annealed and looking for smaller molecules (i.e. molecules 
with higher electrophoretic mobility) than the full length 
probe/test hybrid. Other approaches rely on the use of 
enzymes such as resolvases or endonucleases . 

15 Thus, ^ an oligonucleotide probe that has the sequence of 

a region of the normal gene (either sense or ant i- sense 
strand) in which at least one mutation associated with, 
e.g., cancer susceptibility is known to occur, may be 
annealed to test nucleic acid and the presence or absence of 

20 a mis-match determined. Detection of the presence of a mis- 
match may indicate the presence in the test nucleic acid of 
a mutation associated with, e.g., cancer susceptibility. On 
the other hand, an oligonucleotide probe that has the 
sequence of a region of the gene including a mutation 

25 associated with, e.g., cancer susceptibility may be annealed 
to test nucleic acid and the presence or absence of a mis- 
match determined. The presence of a mis-match may indicate 
that the nucleic acid in the test sample has the normal 
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sequence. In either case, a battery of probes to different 
regions of the gene may be employed. Indeed, probes may be 
included with probes or other materials for other genes for 
stream-lined testing. 
5 The presence of differences in sequence of nucleic acid 

molecules may be detected by means of restriction enzyme 
digestion, such as in a method of DNA fingerprinting where 
the restriction pattern produced when one or more 
restriction enzymes are used to cut a sample of nucleic acid 

10 is compared with the pattern obtained when a sample 
containing, the normal gene or a variant or allele is 
digested with the same enzyme or enzymes. 

A test sample of nucleic acid may be provided for 
example by extracting nucleic acid from cells, e.g. in 

15 saliva or . preferably blood, or for pre-natal testing from 
the amnion, placenta or foetus itself. 

Nucleic acid according to the present invention, such 
as a full-length coding sequence or oligonucleotide probe or 

20 primer, may be provided as part of a kit, e.g. in a suitable 
container such as a vial in which the contents are protected 
from the external environment. The kit may include 
instructions for use of the nucleic acid, e.g. in PGR and/or 
a method for determining the presence of nucleic acid of 

25 interest in a test sample. A kit wherein the nucleic acid 
is intended for use in PGR may include one or more other 
reagents required for the reaction, such as polymerase, 
nucleosides, buffer solution etc. The nucleic acid may be 
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labelled. A kit for use in determining the presence or 
absence of nucleic acid of interest may include one or more 
articles and/or reagents for performance of the method, such 
as means for providing the test sample itself, e.g. a swab 
5 for removing cells from the buccal cavity or a syringe for 
removing a blood sample (such components generally being 
sterile) . In a further aspect, the present invention 
provides an apparatus for screening for XRCC4 and/or DNA 
ligase IV nucleic- acid, the apparatus comprising storage 
10 means including the relevant gene nucleic acid sequence, or 
a fragment thereof, the stored sequence being used to 
compare the sequence of the test nucleic acid to determine 
the presence of mutations. 

15 There are various methods for determining the presence 

or absence in a test sample of a particular polypeptide, 
such as a polypeptide including a fragment of XRCC4 or DNA 
ligase IV corresponding to a particular region involved in 
interaction with counterpart DNA ligase IV or XRCC4 , as the 

2 0 case may be, involved in interaction with one or more other 
proteins or components of a DNA repair pathway, or having a 
particular biological activity, such as DNA ligase enzymatic 
activity. 

A sample may be tested for the presence of a binding 
2 5 partner for a specific binding member such as an antibody 
(or mixture of antibodies) , specific for one or more 
particular variants of the polypeptide, i.e. wild-type or a 
mutant, variant or allele thereof. 



A sample may be tested for the presence of a binding 
partner for a specific binding member such as an antibody 
(or mixture of antibodies) , specific for the polypeptide. 
In such cases, the sample may be tested by being 
5 contacted with a specific binding member such as an antibody 
under appropriate conditions for specific binding, before 
binding is determined, for instance using a reporter system 
as discussed. Where a panel of antibodies is used, 
different reporting labels may be employed for each antibody 
10 so" thkt binding of each can be determined. 

A specific binding member such as an antibody may be 
used to isolate and/or purify its binding partner 
polypeptide from a test sample, to allow for sequence and/or 
biochemical analysis, of the polypeptide to determine whether 
15 it has the. sequence and/or properties of the polypeptide of 
interest, or if it is a mutant or variant form. Amino acid 
sequence is routine in the art using automated sequencing 
machines . 

20 Various further aspects and embodiments of the present 

invention will be apparent to those skilled in the art . in 
view of the present disclosure. Certain aspects and 
embodiments of the invention will now be illustrated by way 
of example and with reference to the figures discussed 

25 already above. 
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DETERMINATION OF BIOLOGICAL ACTIVITY OF XRCC4 

Generation of antisera that recognise XRCC4 

With the aim of gaining insights into the mechanism of 
5 XRCC4 action, it was decided to try to characterise the 
human protein biochemically. Towards this end, full-length 
human XRCC4 and the C-terminal region of XRCC4 comprising 
residues 201-344 were expressed in Escherichia coli as hexa- 
histidine-tagged proteins. After purification to 
10 homogeneity, each antigen was then used to raise polyclonal 
antisera in rabbits. 

In the course of these studies, we observed that 
recombinant full-length XRCC4 runs anomolously upon SDS- 
PAGE, with an apparent molecular mass of -55 kDa, which is 
15 considerably larger than the predicted molecular weight of 
38 kDa . Untagged and His-tagged versions of XRCC4 were 
found to behave similarly. Although the reason for this is 
currently unclear, this might reflect the fact that XRCC4 
contains an unusually large proportion of glutamic acid 
20 amino acid residues, increasing the negative charge of the 
protein. One possible result of this would be a net 
decrease in the amount of SDS bound to the protein which 
would decrease the mobility of XRCC4 upon SDS-PAGE analysis. 
Western blot analyses revealed that each of the anti- 
25 XRCC4 antisera raised was capable of recognising less than 1 
ng of recombinant XRCC4 protein. To establish whether these 
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antisera are capable of detecting endogenous XRCC4 in 
mammalian cell lysates, crude HeLa cell nuclear extracts 
were subjected to SDS-PAGE followed by Western immunoblot 
analysis . 

5 Importantly, each antiserum, but none of the pre-immune 

sera, was found to recognise a HeLa cell protein of 55-60 
kDa, which is in good agr.eement with the size of , recombinant 
XRCC4 . In addition, each immune serum also detects several 
other polypeptides weakly.; Although the identities of these 

10 are not established, some may correspond to alternative 

forms of XRCC4 or its proteolytic degradation products. For 
instance, one band in particular is likely to represent a N- 
terminal XRCC4 proteolytic product, because it is recognised 
by all sera raised against the full-length protein but not 

15 by serum SJ5 that was raised against the XRCC4 C-terminal 
region. Interestingly, despite the high degree of sequence 
conservation between XRCC4 in rodents and humans (Li et ai., 
1995), we have been unable to detect XRCC4 in extracts of 
mouse or hamster cells by direct Western blotting using 

20 these antibodies. This could in part reflect low 

immunological cross-reactivity between the human and rodent 
proteins. However, given the evolutionary conservation of 
■XRCC4, the model that we currently favour is that, as is the 
case for other DNA DSB repair factors, such as Ku and DNA- 

25 PKcs (Blunt et al., 1995; Finnie et ai., 1995; Danska et 
al., 1996), XRCC4 is expressed at much lower levels in 



80 

rodent cells than in human cells (also, see below) . 

To enhance further the specificity of anti-XRCC4 
antiserum SJ4B, this was subjected to immuno-af f inity 
chromatography using XRCC4 that had been attached covalently 
5 to Sepharose beads. Significantly, whereas the crude serum 
recognises a number of polypeptides in HeLa whole cell 
extracts in addition to full-length XRCC4, much of the • 
reactivity towards the other proteins is recovered in the 
flow-through fractions, resulting in the affinity-purified 
10 antibody material (eluate) having improved specificity and 
selectivity as compared to the unf ractionated serum. 

XRCC4 is a nuclear phosphoprotein and serves as an effective 
substrate for DNA-PK in vitro 

15 As a first step towards establishing the biochemical 

function of XRCC4, we decided to try to determine its sub- 
cellular localisation. Nuclear and cytosolic fractions were 
prepared from HeLa cells and were subjected to Western blot 
analysis using affinity-purified XRCC4 antibody SJ4B. The 

20 integrity of the fractions was established by also probing 
with antisera against Spl which is located predominantly in 
the nuclear fraction. 

Notably, these studies revealed that XRCC4 is present 
in the nuclear extract, with low amounts being detectable in 

25 the cytosolic fraction. These data therefore reveal that 

XRCC4 is a nuclear protein and are consistent with models in 
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which XRCC4 serves as part of a DNA DSB repair apparatus. 

During the course of the above studies, we observed 
that HeLa XRCC4 reproducibly migrates more slowly than 
5 recombinant XRCC4 on SDS-PAGE, suggesting that human XRCC4 
is modified post-translationally . To determine whether this 
reflects XRCC4 phosphorylation, HeLa nuclear extract was 
either mock-treated, treated with X protein phosphatase, or 
was treated with X phosphatase in the presence of 
10 phosphatase inhibitors. 

Significantly, Western analysis of these samples 
revealed that X phosphatase increases the SDS-PAGE mobility 
of HeLa XRCC4 so that it is now equivalent to that of the 
recombinant protein, whereas this effect is abrogated by 
15 phosphatase inhibitors. These data therefore reveal that 
XRCG4 is phosphorylated to high stoichiometry in HeLa cell 
extracts and suggest that this modification is used to 
modulate XRCC4 activity in vivo. 

In light of this, and because cells deficient in XRCC4 
20 have very similar phenotypes to those defective in 

components of DNA-PK, we tested whether DNA-PK is able to 
phosphorylate XRCC4 in vitro, XRCC4 indeed serves as an 
effective substrate for DNA-dependent phosphorylation by 
DNA-PK, so DNA-PK may control XRCC4 activity within the 
25 cell. 
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Endogenous XRCC4 appears to he complexed with another 
protein (s) 

There are various ways in which XRCC4 might function in 
DNA DSB repair and V(D)J recombination. 
5 One possibility the inventors considered is that it 

interacts directly with DNA and plays a role in repairing 
DNA damage or signalling its presence to the cell. However, 
we have been unable to detect binding of recombinant XRCC4 
to various DNA species in electrophoretic mobility shift 
10 assays. Furthermore, when HeLa nuclear extracts are passed 
through DNA-agarose columns under salt concentrations in 
which many DNA binding proteins are retained, the majority 
of endogenous XRCC4 flows through. 

These data therefore argue that XRCC4 does not bind 
15 avidly to DNA. 

Another possible role for XRCC4 considered by the 
inventors is for it to interact with another component of 
the DNA DSB repair apparatus. As an approach to address 
this idea, we investigated the biochemical fractionation of 
20 XRCC4 and other known and potential DNA DSB repair factors 
upon gel-filtration, chromatography on Superose-6. To 
disrupt possible non-specific protein-protein associations, 
such experiments were performed under stringent conditions 
of 1 M NaCl. 

25 These studies revealed that recombinant untagged XRCC4 

elutes in a manner consistent with a mass of just over 66 
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kDa, which is larger than the predicted XRCC4 monomer 
molecular weight (Li et al., 1995) and its apparent mass as 
determined by SDS-PAGE. This suggests either that XRCC4 is 
a monomeric protein with shape characteristics causing it to 
5 behave anomolously upon gel-filtration, or exists in 
solution as a multimer, most likely a dimer. 

Most significantly, gel-filtration analysis of HeLa 
nuclear extract in the presence of 1 M NaCl reveals that 
10 endogenous XRCC4 fractionates in a manner consistent with a 
molecular mass of around 200 kDa, which is markedly higher 
than* that for recombinant XRCC4. These data therefore 
suggest strongly that HeLa XRCC4 is associated with another 
protein (s). We took the same set of gel-filtration 
15 fractions tested above for XRCC4 and examined them for the 
presence of Ku, DNA-PK^s, ^^i^i DNA ligases I, III and IV. 

Significantly, although some overlap was evident in 
each case, the XRCC4 elution profile did not parallel those 
exhibited for DNA ligase I, Ku or DNA-PK^g. Thus, ligase I 
20 peaked at -150 kDa which is slightly larger than the 

predicted monomer molecular weight of 1-2 kDA, DNA-PK^g (4 65 
kDA) eluted at around 200 kDa which may indicate that the 
tertiary structure of DNA-PK is disrupted under these 
conditions, and Ku elution peaked at -150 kDa, consistent 
25 with the predicted size of a Ku70/Ku80 heterodimer. 

In marked contrast, the elution profile of XRCC4 was 
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found to be virtually identical to those of DNA ligases III 
and IV. These results therefore raised the possibility that 
XRCC4 exists in stable association with either DNA ligase 
III or IV. 

5 

HeLa cell XRCC4 co-immunoprecipitates with DNA llgase IV 

To further test for possible interactions between XRCC4 
and the factors described above, we immunoprecipitated XRCC4 
from its peak gel-filtration fractions in the presence of 1 

10 M NaCl and 50 pg/ml ethidium bromide {to abolish non- 
specific interaction mediated via DNA) , and tested the 
resulting precipitated material for the presence of Ku, DNA- 
PK^g, and ligases I, III, and IV. Significantly, Western 
immunoblot analyses revealed that DNA-PK^s^ Ku, and DNA- - 

15 ligase I that were present in the XRCC4 fractions did not 
co-immunoprecipitate with XRCC4, confirming that XRCC4 does 
not interact stably with any of these factors. 

To assay for possible interactions between XRCC4 . and a 
DNA ligase enzyme, we employed the fact that mammalian DNA 

20 ligases form covalently-linked adenylate complexes 

(Tomkinson et ai., 1991; Wei et ai., 1995; Danska et ai . , 
1996; Robins and Lindahl, 1996) . When the XRCC4-containing 
gel-filtration fraction was incubated with [a-^^P]-ATP and 
was then examined by SDS-PAGE followed by autoradiography, 

25 adenylated proteins of approximately 120 kDa and 100 kDa 
were detected, which correspond to DNA ligase I and a 
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mixture of DNA iigases III and IV, respectively. 

To see whether any of these Iigases associate with 
XRCC4, unlabelled extract was incubated with pre-immune or 
anti-XRCC4 antisera in the presence of 1 M NaCl then, after 
5 stringent washing, the immunoprecipitated material was 
incubated with [a-"^^P]-ATP and tested for radioactively- 
labelled adenylated proteins. 

. Significantly, these studies revealed that an 
adenylated protein species of -100 kDa, corresponding to DNA 
10 ligase III and/or IV is immunoprecipitated efficiently by 
the affinity purified XRCC4 antiserum but not by pre-immune 
sera. By contrast, the adenylated species corresponding to 
DNA ligase I is not recovered. Importantly, and consistent 
with the fact that the adenylate moiety of adenylated-DNA 
15 ligase complexes is discharged in the presence of ligatable 
polynucleotide substrates, the radiolabel associated with 
the XRCC4-precipitated material is lost upon incubation in 
the presence of DNA that has been nicked by treatment with 
DNase I. 

20 To rule out the possibility that the immunoprecipitated 

ligase was being recognised directly by the anti-XRCC4 
antiserum, we performed parallel immunoprecipitation 
reactions on extracts derived from the hamster cell lines Kl 
and XR-1, which contain and lack XRCC4 protein, 

25 respectively. Importantly, the -100 kDa adenylated ligase 
species is recovered from Kl extracts but not from XR-1 
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extracts. These data therefore reveal that the ligase is 
not recognised by the antiserum directly and, instead, is 
immunoprecipitated via its association with XRCC4 . 

5 Taken together, the above results reveal that XRCC4 

forms a tight salt-stable interaction with DNA ligase III 
and/or DNA ligase IV. To establish which of these two . 
enzymes is associated with XRCC4, we took advantage of the 
fact that ligases III and IV have-' different abilities to 
10 join single-strand breaks in polynucleotide substrates 

containing one DNA strand and one RNA strand. Thus, whereas 
DNA ligase III can catalyse joining in both 

oligo (rA) -poly (dT) and oligo"(dT) -poly (rA) substrates , ligase 
IV is only able to mediate joining of the letter -(Robins, 
15 1996) . 

In light of this, we performed adenylation assays on 
material immunoprecipitated with XRCC4 and then incubated 
the labelled immunoprecipitates with either 
oligo (dT) 'poly (rA) or oligo (rA) 'poly (dT) . Notably, only 
20 oligo (dT) -poly (rA) resulted in dissociation of the adenylate 
group from the ligase that is immunoprecipitated with XRCC4 . 

These results therefore suggest strongly that XRCC4 
interacts tightly and specifically with DNA ligase IV but 
not with DNA ligase III. 

25 

XRCC4 and ligase IV co-purify extensively 
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To confirm the interaction between XRCC4 and DNA ligase 
IV, and to gain insight into what proportion of the two 
proteins exists in this complex, we purified DNA ligase IV 
using established protocols (Robins, 1996) and tested for 
5 the presence of ligase IV and XRCC4 by quantitative Western 
immunoblot analyses at each chromatographic stage. 

As demonstrated previously, we observed that DNA, 
ligases III and IV co-elute during gel filtration 
chromatography {Figure lA) but are resolved from one another 

10 by chromatography on Mono-S {Figure IB) . Significantly, 

XRCC4 tracks with ligase IV throughout these procedures but, 
in contrast, becomes separated from DNA ligase III at the 
Mono-S chromatography step (Figures lA and IB) . 
Furthermore, XRCC4 is present even in more highly purified 

15 samples of DNA ligase IV generated via subsequent 

chromatography on Mono Q, and immunoblot analyses of near 
homogenous ligase IV preparations that we had prepared 
previously demonstrates the existence of XRCC4 . Samples 
further purified on a Mono Q column (Robins, 1996) were 

20 analysed on a 10% SDS-polyacrylamide gel by immunoblots 
testing for the presence of XRCC4 or DNA ligase IV. 
Molecular sizes were estimated from the migration of 
Kaleidoscope pre-stained markers. 

In additional studies, we have also observed that XRCC4 

25 and DNA ligase IV co-purify on phenyl-Sepharose . Indeed, 

short of incubation with harsh ionic detergents, we have yet 
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to find a procedure to separate these two proteins. 

Interestingly, the XRCC4 that co-purifies with ligase 
IV corresponds to the phosphorylated form of the protein as 
evidenced by its SDS-PAGE mobility and by the fact that this 
5 mobility is increased by phosphatase treatment. 

Finally, it is particularly noteworthy that XRCC4 and 
ligase IV co-purify almost quantitatively with one another, 
and that no free pools of either factor are evident (for 
example, see Figures lA and IB). This therefore suggests 
10 that XRCC4 and DNA ligase IV are present at similar levels 

in the cell and that virtually all of each polypeptide 
'" exists in a complex with its partner. 

XRCC4 Interacts with the C~terminal portion -of DNA ligase IV 
15 that comprises two BRCT homology domains 

To gain insights into the basis for the highly specific 
binding of DNA ligase IV to XRCC4, we decided to try to 
determine which region (s) of ligase IV are involved in this 
interaction . 

20 As depicted in Figure 2, DNA ligase I, III and IV 

display high levels of sequence similarity to one another 
within a section that has been defined as the core ligase 
catalytic domain (Wei et ai., 1995). In addition, each 
ligase possesses discrete N- and/or C-terminal extensions 

25 that have been proposed to confer unique properties on the 
three enzymes. Significantly, although the C-terminal 
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extensions of DNA ligases III and IV show very little 
homology with one another at the primary sequence level, 
they possess one and two copies, respectively, of the 
recently identified BRCT homology domain (Koonin et ai., 
5 1996; Callebaut and Mornon, 1997) ; see Discussion below) . 

With the aim of finding which region (s) of ligase IV 
interacts with XRCC4, , we divided the ligase IV polypeptide 
into three portions - . an N-terminal region (corresponding to 

10 amino acid residues 1-198) that exhibits homology with 

ligase I and III, a centraT region (residues 199-549) that 
shows highest levels of homology with ligase I and III and 
which contains the ligase catalytic site, and-a C-terminal 
region (residues 550-844) that contains- the two BRCT 

15 homology domains (Figure 2) . After transcribing and 

translating the three regions separately in vitro, these 
were tested for an ability to bind to Sepharose beads or to 
Sepharose beads containing covalently-attached XRCC4 . 
Samples of LigIV(l-198) , Lig IV(199-549), Lig IV(550-844) 

20 and luciferase were applied to XRCC4-Sepharose beads or 

negative control beads (ON) and unbound proteins collected 
(FT). After washes with either 0.1 M and 1.0 M NaCl, bound 
proteins were eluted with gel loading buffer (SDS) , After 
SDS-PAGE, the [ ^^S] methionine-labelled fragments were 

25 detected by autoradiography. 

The N-terminal and central fragments of DNA ligase IV 
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fail to bind detectably to the XRCC4-beads, as is the case 
for the lucif erase protein that was employed as a control. 
In marked contrast, the ligase IV C-terminal fragment is 
retained almost quantitatively on the XRCC4-beads but not on 
5 control beads lacking XRCC4 . Moreover, the binding of the 
C-terminal portion of ligase IV to XRCC4 appears to be very 
strong, as evidenced by the fact that the ligase . IV C- . 
terminal region is not eluted by washing at 1 M NaCl and is 
only recovered following addition of the ionic detergent 
10 SDS. 

Finally, to further address the specificity of the 
above interaction, we' assessed whether XRCC4-containing 
beads could be used to purify the C-terminal region of 
ligase IV from crude bacterial lysates . To do this, an 

15 unf ractionated extract of E. coll expressing this region 
(LIGIV (550-844)) at fairly low levels was incubated with 
XRCC4-containing Sepharose beads and unbound proteins 
collected. Bound material was then eluted with step-wise 
increases in salt concentrations, followed by a final 

20 elution in the presence of gel loading buffer, SDS. 

Strikingly, as shown by total Coomassie-blue protein 
staining of an SDS-polyacrylamide gel containing these 
fractions, this procedure results in the C-terminai region 
of ligase IV being purified to virtual homogeneity in a 

25 single step. The identity of this polypeptide as the ligase 
IV C-terminus was confirmed by Western blot analyses, and 
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this protein was not retained by Sepharose beads alone. 

Taken together, these results attest to the extreme 
strength and specificity of the interaction between the 
5 ligase IV C-terminal region and XRCC4 . 

MATERIALS AND METHODS - . ^ 

Enzymes , antibodies and DNA 

pET-30b and pQE-30 were obtained from Novagen and . 
10 Qiagen, respectively. Purified mouse monoclonal MRGS.His 

antibody (Qiagen) that recognises pQE-30-derived His-tagged . 
proteins was used as per manufacturer's instructions. Ku70, 
Ku8 0, and DNA-PK^,g antisera were used as described previously 
(Hartley at al., 1995; Finnie et ai . , 1996)., ^Antibodies 
15 against ligase I (TL5) , ligase III (TL25) and ligase IV 

(TL18) were used as described by (Lasko et al., 1990; Robins 
and Lindahl, 1996) . Antigen-antibody complexes were 
detected by enhanced chemi-luminescence (Amersham) according . 
to the manufacturer's instructions. HeLa nuclear extract 
20 was obtained from Computer Cell Culture Centre, Mons, 

Belgium. All plasmid constructs were verified by automated 
DNA sequencing. 

Expression and purification of XRCC4 derivatives 
25 To generate recombinant untagged XRCC4 , the full-length 

XRCC4 coding region was amplified from pBlueScript. 
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containing the human XRCC4 gene by PGR and inserted into 
pET-30a (Novagen) digested with Nde 1/Sal I such that the N- 
terminal His/S tags are not present and so that the XRCC4 
stop codon prevents the addition of the C-terminal His-tag, 
5 For protein expression, BL21(DE3) cells harbouring the 
resulting plasmid {pET30XRCC4) were grown in 500 ml cultures 
of LB/kanamycin (50 pg/ml) to mid-log prior to induction 
with 0.4 mM IPTG for 4 h at 37°C. After lysing the 
collected cell pellet by sonication, 30.2 g of ammonium 
10 sulphate was added slowly per 100 ml of supernatant and 
incubated with stirring at 4°C for 30 .min. After 
centrifugation, the pellet was resuspended in TED (50 mM 
Tris-HCl pH 7.5, 2 mM DTT and 1 mM EDTA) and dialysed 
extensively against TED. Protein was then loaded onto a 
15 heparin-Sepharose column pre-equilibrated with TED and 
protein was eluted with a linear gradient of 0 to 0.6 M 
NaCl. Fractions containing XRCC4 were pooled and dialysed 
against TED containing 1.0 M ammonium sulphate then were 
loaded onto a pre-equilibrated phenyl-Sepharose column. 
20 Proteins were eluted with a 100 ml linear gradient of 1.0 
to 0 M (NH4)2S04. Fractions containing XRCC4, eluting at 
-0.2 M (NH4)2S04, were pooled and dialysed against 50 mM 
Tris.HCl pH 7.5, 2 mM DTT, 1 mM EDTA and 10% (w/v) glycerol, 
and stored at -80 °C. 

25 

Anti-XRCC4 antibody production and purification 
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Regions of the XRCC4 gene were amplified by PGR from 
pBlueScript containing the human XRCC4 gene then were 
inserted in-frame downstream of the hexa-histidine (His) tag 
of pQE-30 (Qiagen, USA) and were expressed and purified 
5 according to the manufacturer's instructions from the 
soluble fraction of bacterial lysates. Antibodies against 
the soluble recombinant proteins were raised in rabbits ; 
using standard procedures (Harlow and Lane, 1988) and are 
available commercially from Serotec, UK. Western immunoblot 

10 analyses . were performed as described previously (Harlow and 
Lane,, 198.8) and blots were developed by enhanced chemi- 
luminescence (Amersham) . Recombinant His-tagged full-length 
,XRCC4 -was attached to Sulfolink Coupling Gel (Pierce, USA) , 
and was used to immuno-af f inity purify. anti-XRCC4 antibodies 

15 from crude SJ4 serum as described previously (Lakin at al., 
1996) , 

PhosphatasB-treatment of HeLa cell extracts 

To analyse for XRCC4 phosphorylation, HeLa nuclear 
20 extract (50 pg) was treated with X protein phosphatase (New 
England Biolabs) in the presence of 2 mM MnCl2 and incubated 
for 30 min at 30^0 prior to SDS-PAGE and Western blotting. 

Co-immunoprecipitations and ligase adenylylatlon assays 
25 XRCC4 was immunoprecipitated from HeLa nuclear extract 

using polyclonal anti-XRCC4 and a pre-immune control. 
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Specifically HeLa nuclear extract was dialysed into Buffer 
D* (20 mM HEPES-KOH, 20% (w/v) glycerol, 50 mM KCl, 2 mM 
MgCl2/ 0.2 mM EDTA, 1 mM DTT, 0 . 5 mM PMSF, 1 mM sodium 
metabisulphite and 0.1% NP-40) and then incubated with 
5 either pre-immune or immune serum for 1 h at 4°C in the 
presence of 50 pg/ml ethidium bromide to disrupt non- 
specific interactions (Lai and Herr, 1992) . Immune 
complexes were bound to protein A Sepharose beads 
(Pharmacia), followed by extensive washing with Buffer D* 

10 containing 0.15-1 M NaCl. The protein A Sepharose beads 
were finally washed in Buffer D* containing 0.15 M NaCl 
prior to analysis. The samples were then tested for the 
ability to form DNA ligase-adenylated complexes as described 
previously (Robins and Lindahl, 1996). The . polynucleotide 

15 substrates oligodT-polyrA and oligorA-polydT were prepared as 
described (Tomkinson et al,, 1991). The reactivity of the 
enzyme-adenylate intermediates formed was examined by adding 
0.8 pg of unlabelled oligodT-polyrA or oligorA-polydT for 1 
hr at 30°C. The reactions were stopped by the addition of 

20 SDS sample buffer and adenylylated proteins detected by 
autoradiography following SDS-PAGE. 

Gel- filtration chromatography 

Total HeLa nuclear extract (6 mg protein) was dialysed 
25 extensively against Buffer A (50 mM Tris-HCl pH 7.5, 1 mM 
EDTA, 0.5 mM DTT, 10% (w/v) glycerol) containing 1 M NaCl. 
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The protein was then loaded onto a Superose 6 (Pharmacia) 
column (60 x 1.5 cm) pre-equilibrated with Buffer A 
containing 1 M NaCl . On an identical gel-filtration run, 
0.2 mg of pure untagged recombinant XRCC4 was analysed in 
5 buffer A containing 1 M NaCl . 

Purification of.DNA ligase IV from HeLa cells 

DNA ligase IV was purified from HeLa cells as described 
p^reviously (Robins, 1996) . Fractions collected from each of 
10 the columns were analysed by immunoblots with antibodies 
specific for DNA ligases III. and IV and XRCC4 . 

Expression of recombinant ligase IV derivatives 

For generation of recombinant ligase IV derivatives, 

15 fragments of the human ligase IV gene coding region were 
amplified by PGR from reverse transcribed HeLa RNA. Each 
PGR product included a Bam HI site at the 5' end and a stop 
codon followed by aSai I site at the 3' end, and after 
digestion were ligated into pET30b digested with Bam Hl/Sal 

20 I. The 550-844 fragment of ligase IV was also cloned into 
pQE-30 and the resultant clone was expressed in E, coli 
M15(Rep4). For in vitro transcription and translation of 
ligase IV fragments, 1 pg of pET30LigIV (1-1 98 ) , 
pET30LigIV(199-549) , pET30LigIV (550-844) or a luciferase 

25 control (Promega) were transcribed in vitro and translated 
using the. TnT rabbit reticulocyte lysate kit (Promega) 
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according to the manufacturer's instructions. Resulting N- 
terminally His-tagged ligase IV products were purified by 
Ni^'^-NTA agarose chromatography. 

Briefly, a 100 pi bed volume of Qiagen Ni^"^-NTA agarose 
5 was pre-equilibrated in wash buffer (50 mM Tris-HCl pH 7.5, 
20 mM imidazole, 10% (w/v) glycerol and 0.5 M NaCl) prior to 
the addition of 20 pi of the crude [ -^^S ] methionine-labelled 
in -vitro translated ligase fragment. Unbound proteins were 
removed after low-speed centrif ugation and the resin was 
10 washed 3 times with wash buffer to remove non-specif ically 
bound proteins. Finally, ligase IV proteins were eluted 
with 100 pi of elution buffer consisting of 50 mM Tris-HCl 
pH 7.5, 100 mM imidazole and 10% (w/v) glycerol. 

■ 15 Interaction assays between recombinant XRCC4 and ligase JV 
derivatives 

Full-length XRCC4 was immobilised on Sepharose-4B gel 
beads (Pharmacia) using the cyanogen bromide method 
according to the manufacturer's instructions. As a negative 

20 control, coupling was also performed without XRCC4 protein. 
A 30 pi bed volume of beads (with and without XRCC4) was 
pre-equilibrated with Binding Buffer (50 mM Tris-HCl pH 7.5, 
2 mM DTT, 1 mM EDTA, 10% (w/v) glycerol, 0.1% NP-40 and 0.36 
mg/ml BSA) before the addition of the His-tag purified in 

25 vitro translated ligase products or luciferase. Unbound 
material was collected after centrif ugation and the beads 
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were washed with Binding Buffer containing 0.1 M NaCl and 
1.0 M NaCl. The beads were resuspended in gel loading 
buffer and boiled in preparation for analysis by SDS-PAGE. 
Finally SDS-PAGE gels were fixed in 10% acetic acid for 30 
5 min and dried, and labelled proteins were detected by 
autoradiography. The ability of the recombinant C-terminal 
fragment of ligase IV (residues 550-844) to bind .•XRCC4-- 
Sepharose beads was tested as above, except that analysis 
was by Coomassie-brilliant- blue staining and immunoblotting 
10 with the anti-ligase IV antibody. 

EXAMPLE 2 

DETERMINATION OF BIOLOGICAL ACTIVITY OF DNA LIGASE IV 

15 Identification of a second^ hitherto uncharacterised, yeast 
DNA ligase 

Using a consensus seguence within the core catalytic 
- domain of all published DNA iigases the present inventors 
searched through the recently fully sequenced S. cerevisiae 
20 genome (Goffeau et al., 1996). 

In addition to detecting CDC9, which encodes DNA ligase 
I, these searches identified an ORF (YOR005c) present on 
chromosome XV as a highly significant hit. 

This ORF encodes a 944 amino acid residue polypeptide 
25 of predicted molecular mass of 109 kDa that exhibits 

extensive similarity (24 % identity; 43 % similarity) in its 
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central region to the "core" ligase conserved domain of DNA 
ligase I {Figure 3A) . Phylogenetic analyses of protein 
' alignments over this region reveal that YOROOSc is 
considerably more related to DNA ligases of eukaryotes and 
5 eukaryotic viruses than to those of prokaryotes , and in 
particular, is most closely related to human ligase IV 
{Figure 3B) . Consistent with this, YOROOSc shares an 
terminal extension with the mammalian enzymes that is 
lacking in prokaryotic DNA ligases (Figure 3C) . 
10 Furthermore, it possesses an additional C-terminal extension 
that is homologous throughout its length to that of 
mammalian ligase IV (Figures 3A and 3C) . We thus conclude 
that YOROOSc encodes a homologue of mammalian DNA ligase IV 
and designate this tocus LIG4 . 

IS 

Disruption of LIG4 does not lead to marked hypersensitivity 
to a variety of DNA-damaging agents 

To study LIG4 function, we inactivated this gene in the 
haploid yeast strain W303a by a one-step gene disruption. 

20 Notably, resulting lig4 mutants do not have readily 

observable growth defects when propagated at temperatures 
ranging from 18°C to 37°C. This contrasts markedly with 
CDC9, the gene encoding yeast ligase I, whose disruption 
results in lethality due to an inability to progress through 

25 S-phase (Johnston and Nasmyth, 1978) . It is thus concluded 
that LIG4 does not play an essential role in DNA 
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replication, and that yeast ligase I is the only DMA ligase 
required for this process. 

Next> we tested whether ligA mutant yeast are defective 
in any of the predominant DNA repair pathways by assessing 
5 their sensitivity to killing by various DNA damaging agents. 
Notably, Xig4 mutant strains are not hypersensitive to DNA 
damage induced by exposure to ultraviolet radiation, showing 
that it is not essential in nucleotide excision repair. In 
addition, strains disrupted for LIG4 are riot, hypersensitive 
10 a range of concentrations of the radiomimetic drug, methyl 
methanesulf onate (MMS) in the growth medium.- -Finally, l^ig4 
mutant yeasts also do not display significantly elevated 
sensitivity to killing by i.onising radiation at -a range of 
doses (0 - 45 kRad; Figure 4A and data not shown) . . - 

-15 Since radiation-induced DNA double-strand-breaks (DSBs) 

are repaired primarily by homologous recombination in S. 
c&rB\/isiae, these data suggest that L1G4 is not essential 
for the latter process. Consistent with this, we have found 
that the efficiency of homologous recombination-mediated 
20 targeted integration into various loci in the yeast genome 
is indistinguishable between wild-type and lig4 mutant 
strains . 



LIG4 functions in t/ie Ku-dependent NHEJ pathway of DNA 
25 douhle-strand break repair 

In 5. cerevlslaer radiation-induced DNA double-strand 
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breaks (DSBs) are repaired primarily by homologous 
recombination, which is mediated by genes in the RAD52 
epistasis group (Friedberg et ai., 1995). Thus, disruption 
of RAD52 sensitises yeast cells to ionising radiation 
5 (Figure 4A) . However, eukaryotic cells can also repair DNA 
DSBs by a second pathway, non-homologous end- joining (NHEJ) , 
that utilises gene products distinct from those_ employed in 
homologous recombination. In both yeast and mammals, one of 
these components is the DNA-binding protein Ku, comprising 
10 subunits of -70.kDa and -80 kDa [Ku70 and Ku80 in mammals 
{Jackson and Jeggo, 1995); Yku70p/Hdflp and Yku80p/Hdf2p in 
yeast {Feldmann and Winnacker, 1993; Boulton and Jackson, 
1996a; 1996b; Feldmann et ai . , 1996; Mages et al., 1996; 
Milne et a-lr, 1996; Siede et al., 1996; Tsukamoto et ai., 
15 1996) ] . NHEJ appears to be the predominant pathway for DSB 
repair in mammals but represents a minor pathway in yeast; 
consequently, disruption of S. cerevisiae YKU70 or YKU80 
only results in significantly increased sensitivity to 
ionising radiation or MMS when homologous recombination is 
20 inoperative (Boulton and Jackson, 1996a; 1996b; Milne et 
ai., 1996; Siede et ai., 1996). 

We have found that Ilg4/rad52 double mutants are 
considerably more radiosensitive than are strains disrupted 
for RAD52 alone (Figure 4B) . This provides indication that 
25 LIG4 is involved in the repair of ionising radiation-induced 
DNA damage and that it functions in a iRAD52-independent 
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pathway . 

Since the effects of disrupting LIG4 are similar to 
those obtained by disrupting YKUlO or YKUSOf we assessed the 
radiosensitivity of Iig4/yku70/ rad52 triple mutants. Such 
5 mutant strains are no more sensitive to ionising radiation 
than are the Iig4/rad52 or yku70/rad52 mutant strains 
{ Figure 4B) . ^ ^ " 

Taken together, these data provided indication that Ku 
and LIC4 function in the same DNA repair pathway. 

10 

Previous work has shown that Ku functions in DNA NHEJ 
and that this process can be measured through, employing an 
in vivo plasmid repair assay (Boulton and Jackson, 1996a; 
1996b; Milne et ai,, 1996). In this assay, a yeast-E. coli 

15 shuttle plasmid pBTM116 (Figure 5A; Boulton and Jackson, 

1996a; 1996b) is linearised by restriction enzyme digestion, 
then is introduced into S, cerevisiae by transformation. 
Since the plasmid must be recircularised to be propagated, 
the number of yeast transformant colonies obtained 

20 quantifies the ability of the strain to repair the plasmid. 
Furthermore, since the DNA DSB generated in these studies 
resides in a region that is not homologous to the yeast 
genome, homologous recombination is suppressed and repair 
operates predominantly via NHEJ. 

25 We therefore analysed the ability of lig4 mutant yeasts 

to repair pBTM116 after cleavage with various restriction 
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endonucleases . 

As with strains disrupted for YKUlO or YKU80, l±g4 
mutant strains are severely impaired in plasmid NHEJ, and 
this is observed both with 5* or 3 ' overhanging DNA ends 
5 {Figure 5B) . Interestingly, these studies reveal that the 
effect of lig4 mutations is less pronounced with 3' 
overhanging DNA ends than it is with 5' overhanging ends. 
Although other alternatives exist, it is possible that this 
reflects differences in the mechanisms by which the two 

10 types of DNA ends can be repaired or is due to differential 
sensitivities of the different end structures to nuclease 
attack. Notably, DNA repair is not impaired further in 
ykulO/ l±g4 double mutant strains (Figure 5B) . In addition, 
although- the precise reason for this effect is. not known, as 

15 is the case for ykulO or yku80 mutants, we have found that 
lig4 mutant yeasts have a slightly elevated ability to 
rejoin pBTMllS bearing blunt-ends. 

Taken together, these results reveal that Lig4p plays a 
crucial role in the repair of plasmid molecules bearing 

20 cohesive DNA double-strand breaks in vivo. Secondly, they 
show that, although purified DNA ligase I {CDC9) has been 
shown to be capable of catalysing DSB joining in vitro 
(Tomkinson at ai., 1992), this enzyme does not play a major 
role in this pathway as assayed by in vivo plasmid DSB 

25 rejoining, and is unable to substitute efficiently for Lig4p 
in this process. Finally these results also show that Lig4p 
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plays an important role in the Ku-dependent NHEJ pathway. 

Although piasmid repair is reduced dramatically in lig4 
mutant strains, it is not abolished. To determine precisely 
5 the types of DNA repair events that are dependent or 
independent of LIG4, repaired plasmids were recovered and 
then- analysed by restriction enzyme digestion and DNA 
sequencing {Figure 5C) . Of the large number of plasmids 
recovered from parental strains, all had been repaired by 

10 direct ligation of the cohesive DNA termini, thus 

regenerating the restriction enzyme cleavage site (Boulton 
and ^Jackson, 1996a; 1996b; Figure 5C) . Piasmid repair 
products recovered from ykulO or Xxg4. mutant strains, 
however, were, found to fall into several categories. Some of 

15 these corresponded to "gap repair" products which we have 
shown are generated via i?A£>52-dependent homologous 
recombination with yeast genomic DNA (sequence analyses 
reveal that homologous recombination is employed in the 
generation of these products and their production is 

20 abolished by disruption of RAD52; Boulton and Jackson, 

1996a; 1996b and data not shown) . This therefore provides 
further evidence that LIG4, like YKU70 and YKU80, does not 
play a crucial role in homologous recombination processes. 
In ykulO or yku80 mutant strains, virtually all of the 

25 residual repair products were found to have suffered 

deletions (Boulton and Jackson, 1996a; 1996b; Figure 5C) . In 
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contrast, although many of the residual repair products 
generated in lig4 mutants had also suffered deletion of 
terminal sequences, some were rejoined accurately (Figure 
5C) . 

5 Collectively, these results provide insights into the 

distinct roles performed by Lig4p and Ku in DNA NHEJ {see 
Discussion below) . 

Lig4p, unlike Ku, does not appear to function in telomere 

10 length maintenance 

Telomeres occur at the ends of eukaryotic chromosomes, 
are structurally distinct, and have unusual replication 
intermediates for which it is unclear whether a distinct DNA 
ligase is necessary (Blackburn,. 1991; Zakian, 1995; Lundblad 

15 and Wright, 1996) , Recent work has demonstrated that Ku 
functions in telomere homeostasis, since disruption of 
either YKU70 or YKUSO results in a dramatic reduction in 
telomeric length (Boulton and Jackson, 1996b; Porter et al., 
1996) . 

20 Given that Lig4p and Ku function together in DNA NHEJ, 

we tested whether LIG4 is involved in telomere , length 
control . 

To do this, yeast genomic DNA was digested with the 
restriction enzyme Xhol , which in wild-type strains produces 
25 a predominant telomeric fragment of -1.3 kb that is detected 
by Southern hybridisation to an oligonucleotide probe that 
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binds to the repetitive telomeric sequences, including -400 
bp of repeating (C^_3A) sequence and is detected by Southern 
hybridisation using a radiolabelled poly (GT)20 
oligonucleotide. Notably, whereas disruption of YKU70 
5 results in telomeric shortening, loss of LIG4 function has 
no detectable effect. 

These data therefore reveal that, although Ku and ■Lig4p 
function together in DNA DSB repair, Ku but not Lig4p has an 
additional essential function in telomere length 
10 homeostasis. . . 



MATERIALS AND METHODS 

Gene disruptions , . , . - 

15 Full length LIG4 was amplified by PGR with primers 

LIG4-1 and LIG4-2 (5* TCAGTAGTTGACTACGGGAAAGTCT 3' and 5' 
ATGATAtCAGCACTAGATTCTATAC 3*, respectively) using the Expand 
High Fidelity DNA polymerase (Boehringer Mannheim). After 
cloning into pGEM-T (Promega), the resultant plasmid was 

20 digested with EcoRl , treated with Pfu DNA polymerase and 
then digested with Xbal . The HIS3 marker was inserted to 
replace the LIG4 ORF between residues 289 and 592. The 
disruption fragment was excised with Sphl and Spel and was 
used to transform the appropriate yeast strains to His"*". 

25 Gene disruption was verified by using LIG4 and HIS3 primers 
in PGR. Two RAD52 disruption constructs were provided by D. 
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Weaver and have TRPl and URA3 selection respectively. 

Assessment of sensitivity to temperature and DMA damaging 
a gen t s 

5 Aliquots (15 pi) of serial 5-fold dilutions of mid-log 

phase yeast cultures were spotted onto YPDA plates and were 

grown for 36 h at 30°C or 37 °C. Strains on one plate were 

2 

exposed to 50 J/m ultraviolet (UV-C) radiation 
{Stratalinker; Stratagene) . On another plate, YPDA medium 
10 contained 0.0025% methylmethanesulf onate . In other studies, 
lig4 mutant strains did not display hypersensitivity to MMS 
(0.0005% and 0.005% in the growth medium) nor to UV-C (20 - 
150 J/m^) . 

15 Ionising irradiation survival assays 

Three independent isolates of each strain were 
inoculated either into minimal media lacking the appropriate 
amino acid(s) or into YPDA and were grown overnight at 30°C. 
Cultures were diluted in sterile water to an ODgQOnm value 

20 equivalent to 1 x 10^ cells/ml and 1 ml aliquots were 

irradiated using a ■'•^^Cs source at a dose of 0.18 kRad/min. 
Irradiated samples and unirradiated controls were then 
diluted and plated in duplicate using an automated spiral 
plater (Whitley) on YPDA or minimal media. Colony numbers 

25 were ascertained following incubation at 30°C for 3-4 
days . 
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Plasmid repair assay 

Plasmid repair assays were performed as described 
previously (Boulton and Jackson, 1996a; 1996b) . Briefly, the 
yeast-£:scherichia coii shuttle plasmid pBTM116 (2-5 pg) , 
5 which contains ri?Pl for selection in yeast, was digested - 
with the appropriate restriction enzyme to completion and 
the enzyme was inactivated by treatment at 65.°C for 20 itiin. 
Linearised DNA was then used to transform yeast by the 
lithium acetate method (Ausubel et, al -ISQl) . FaralXel 

10 transformations were performed with an . equivalent amount of 
uncut plasmid to enable normalisation for- differences in 
transformation efficiency. Diluted samples- were plated in- 
duplicate on minimal media lacking the. appropriate amino 
acids, and colonies . were counted following incubation at 

15 30°C for 3-4 days. To analyse plasmid repair products, DNA 
from single yeast transf ormants was isolated via the Yeast 
DNA Isolation kit (Stratagene) and this was used to 
transform E. coli XLl-Blue cells (Stratagene) to ampicillin 
resistance. Plasmid DNA was then isolated and was analysed 

20 by restriction enzyme digestion and by DNA sequencing. 

yeast DNA extraction and analyses of telomeric DNA 

Genomic DNA from S. cerevisiae was isolated essentially 
as described (Ausubel et al., 1987). For telomere analyses, 
25 2 pg of genomic DNA was digested with 30 U of Xhol 

(Boehringer Mannheim) at 37°C overnight. The digested DNA 
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was then separated on a 1.2 % agarose 1 x TAE gel and was 
transferred to Hybond Nfp+ membrane (Amersham) by capillary 
transfer in 20 x SSC as suggested by the manufacturer. 
Membranes were pre-hybridised in 0.5 M sodium phosphate, pH 
5 7.2, 1% SDS and then hybridised with 3 ng/ml of ^^P-ond- 
labelled poly (GT)2o oligonucleotide (specific activity of 
>10^/pg) in a Church-based buffer (0.2 M sodium phosphate, 
pH 7.2, 1% BSA, 6% polyethyleneglycol 6000, 1% SDS) ■ 
overnight at 62 °C. Finally, membranes were washed twice at 
10 room temperature for 30 min in 0.2 M sodium phosphate, 0.1% 
SDS, then exposed to pre-f lashed X-ray film at -70°C. 

DISCUSSION 

The inventors' work with XRCC4 indicates that it is a 
15 predominantly nuclear protein. Moreover, through a variety 
of approaches, we have demonstrated that XRCC4 mediates 
extremely tight and specific interactions with DNA ligase 
IV. For example, these two components co-immunoprecipitate 
highly specifically with one another from HeLa cell 
20 extracts, even in the presence of 1 M NaCl . Furthermore, 
such interactions are not abrogated by ethidium bromide, 
indicating that the interaction, between XRCC4 and ligase XV 
is not mediated by a DNA intermediate. Indeed, we show that 
bacterially expressed XRCC4 and ligase IV also bind to one 
25 another tightly, revealing that their interaction is direct. 
In addition, XRCC4 and ligase IV co-purify over every 
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chromatographic fractionation procedure we have employed, 
including gel-filtration in the presence of 1 M NaCl, anion 
and cation exchange chromatography, and hydrophobic 
interaction chromatography. Indeed, so far we have only 
5 resolved these two proteins by the addition of harsh ionic 
detergents. 

The fact that XRCC4 interacts tightly with_ ligase -IV- - 
but not with the other DNA ligases that we have analysed has 
lead us to investigate the basis for this binding. ■ 

10 specificity. Notably, although all characterised mammalian 
DNA- ligases contain a common highly related core catalytic- 
region, each possesses unique N- and/or C-terminal . . , 
extensions. We have found that it is the unique C-terminal 
domain and not the ligase catalytic region of ligase IV that 

15 interacts with XRCC4 . Interestingly, this region of ligase 
IV contains two tandem copies of the weakly conserved BRCT 
homology domain (Koonin et al., 1996; Callebaut and Mornon, 
1997) , leading one to speculate that it is one or both of 
these domains that mediate the interaction with XRCC4 . BRCT 

20 domains also exist in a variety of other factors and are 

required for those factors to interact (Mackey et ai., 1997; 
Nash et al., 1997), suggesting that the BRCT domain of one 
protein interacts with a BRCT domain of the other. Thus, in 
light of our work showing interaction between XRCC4 and DNA 

25 ligase IV it might be expected that XRCC4 would also possess 
one or more copies of the BRCT consensus. Although XRCC4 
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has not been identified as a BRCT domain-containing protein 
by previous analyses, we have used manual and computer-aided 
inspections of the XRCC4 sequence to reveal limited 
homologies to other BRCT domains, suggesting that XRCC4 
5 might contain one or more divergent copies of this putative 
protein structural unit. 

Given that XRCC4 clearly functions in DNA NHEJ, our 
data indicate that DNA ligase IV also plays a crucial role 
in this process. XRCC4 may serve as a molecular bridge to 
10 target ligase IV to DNA DSBs, perhaps through XRCC4 also 

interacting with other components of the DNA NHEJ apparatus 
(Figure 7). In regard to such a putative bridging function 
for XRCC4, it is worthy of note that immunoprecipitation 
studies suggest that XRCC4 can interact with Ku and/or DNA- 

15 PK^g, although these interactions only occur at low salt 

concentrations and hence are weak compared to those 
exhibited between XRCC4 and ligase IV. A possible physical 
linkage between XRCC4 and DNA-PK is attractive in light of 
the fact that we have shown that HeLa cell XRCC4 is a 
20 phospho-protein and is an effective substrate for DNA-PK in 
vitro. Since XRCC4 possesses a DNA-PK kinase consensus 
motif {Li et al., 1995), mutation of this site may affect 
XRCC4 function in vivo, 

25 Consistent with the proposal that ligase IV plays an 

important role in DNA DSB repair, we have identified a S. 



cerevisiae homologue of DNA ligase IV (displaying extensive 
sequence similarity along its length with mammalian DNA 
ligase IV) and have shown that inactivation of this factor 
debilitates DNA NHEJ in a manner that is epistatic with 
5 mutations in the yeast homologues of Ku70 and Ku80 (Bouiton 
and Jackson, 1996; Bouiton and Jackson, 1996; Teo and 
Ja_ekson,. 1997). By contrast, we find that yeast ^ ligase • IV 
does not appear to play an essential role in the repair of 
ultraviolet light -induced DNA damage nor in the repair of 
10 DNA DSBs by homologous recombination (Teo and Jackson, 

1997) . Taken together with the data on XRCC4, this provides 
indication that ligase IV is dedicated to DNA NHEJ and that 
this function is conserved throughout the eukaryotic 
kingdom. 

15 The yeast gene, which we have designated LIG4, is not 

essential for DNA replication, i?AD52-dependent homologous 
* recombination nor the pathways of nucleotide excision repair 
and base excision repair. Instead, we have shown that LIG4 
is specifically involved in the rejoining of DNA double- 

20 strand breaks by the process of DNA NHEJ, which does not 
demand homology between the two recombining DNA molecules 
and does not require RAD52; Notably, genetic epistasis 
experiments reveal that LIG4 acts in the same DNA repair 
pathway as Ku, a nuclear protein that specifically 

25 recognises: DNA strand breaks. We have thus identified a 
novel S.. cerevxslae DNA ligase and have shown that it is 
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involved specifically in the Ku-dependent NHEJ pathway of 
DNA DSB repair. 

In light of this, and given that mutations in YKU70 or 
YKU80 result in dramatic telomeric shortening in yeast 
5 {Boulton et al,, 1996b; Porter et ai., 1996)-,. we have also 
assessed the potential involvement of LIG4 in telomere 
length homeostasis. Telomeres are the protein-DNA structures 
at the ends of eukaryotic chromosomes that ensure the 
complete replication of chromosome ends, protect these ends 

10 from degradation, and prevent chromosomal termini from 
activating DNA damage signalling pathways or engaging in 
fusion and recombination reactions with other loci [for 
reviews (Blackburn, 1991; Zakian, 1995; Lundblad and Wright, 
1996) ] . In most organisms, telomeres- a"*re composed of 

15 variable numbers of simple repeat sequences and, at least in 
S. cerevisiae, the length of these sequence arrays is 
maintained by a combination of telomerase activity and 
JRAi:?52-dependent and -independent recombination. In yeast, 
deficiencies in Ku result in an approximately 70% reduction 

20 in the number of telomeric repeat sequences (Boulton et ai., 
1996b; Porter et ai., 1996). Given that Ku binds to the ends 
of double-strand DNA (Mimori and Hardin, 1986; Paillard and 
Strauss, 1991), one possibility is that Ku may interact 
directly with telomeric DNA ends and potentiate telomere 

25 lengthening by protecting telomeric DNA termini from 
nucleases or by augmenting telomerase recruitment. An 
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alternative explanation is that the effect of Ku 
inactivation on telomere length is indirect - perhaps the 
DNA repair defects that are associated with Ku deficient 
yeasts result in changes in cell physiology that impinge 
5 indirectly on telomere length control. Although it is not 
possible at present for us to identify precisely how Ku 
affects -telomere length', the fact that mutations ^ in LIG4 
have essentially the same DNA repair defect as Ku but do not 
alter telomere - length argues for a specific role for Ku in 

10 telomere homeostasis that is distinct from its activities in 
DNA DSB repair. In. this regard, it will be of interest to 
see whether mutated derivatives of Ku can be generated that 
have no effect. on DNA repair .but do result in defective 
telomeric maintenance , . 

15 Yeast ceils mutated in LIG4 have pronounced defects in 

DNA NHEJ, showing that Lig4p plays a crucial role in this 
process that cannot be complemented efficiently by yeast DNA 
ligase I. Conversely, yeast CDC9 and human DNA ligase I 
mutants are defective in DNA replication and, at least in 

20 vitro, this function is not performed efficiently by other 
enzymes. This indicates that yeast DNA ligases I and IV have 
distinct and largely separate cellular functions and cannot 
substitute effectively for one another. Thus, DNA ligase I 
plays a crucial role in DNA replication and also appears to 

25 seal single-strand DNA breaks that are the end-products of 
nucleotide- and base-excision repair, and moreover, is 
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likely to complete recombination events between homologous 
duplex DNA molecules. There are also data suggesting that 
mammalian DNA ligase III is specialised towards particular 
functions. One splice variant (DNA ligase Ill-a) may operate 
5 in a separate pathway for base excision repair while another 
variant (DNA ligase III-3) has been implicated in meiotic 
recombination. Notably, there are no obvious homologues- of 
mammalian DNA ligase II/ III in S. cerevisiae. However, 
sequence analyses (Fig 1; Colinas et ai., 1990; Kerr et ,ai., 
10 1991; Husain et ai., 1995) reveal that these ligases are 
related more closely to DNA ligases encoded by cytoplasmic 
poxviruses than they are to DNA ligase I, suggesting that 
ligases II and III may have arisen fairly recently in 
vertebrate evolution. Interestingly, and largely consistent 
15 with the proposed functions for mammalian ligase III, 

inactivation of poxvirus DNA ligase does not affect viral 
DNA replication or recombination but renders the mutant 
virus more sensitive to DNA damage induced by UV or 
bleomycin (Colinas et ai., 1990; Kerr et ai . , 1991). 
20 Collectively, these data suggest that DNA ligase I and 

perhaps DNA ligase II/III are involved predominantly in the 
rejoining of single-stranded nicks whereas DNA ligase IV is 
the major enzyme catalysing the joining of double-stranded 
breaks . 

25 In light of these points, and given that LIG4 functions 

in the highly evolutionarily conserved Ku-dependent NHEJ 
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pathway, mammalian DNA ligase IV is indicated as having a 
key role in Ku-dependent DNA DSB rejoining. As is the case 
for Ku (reviewed in Jackson and Jeggo, 1995) , deficiency in 
mammalian ligase IV may result in cellular radiosensitivity 
5 and an inability to rejoin site-specific V(D)J recombination 
intermediates. 

Although the available data suggest diversification of 
function for the different eukaryotic DNA ligases, it is 
unclear whether this arises, from- intrinsic differences in 

10 catalytic activity or from differences conferred, for 

example, by the distinct C- and N-terminal extensions of the 
enzymes. At least in vitro, purified human DNA ligases I, 
III and IV show differing capacities to join single-stranded 
breaks in hybrid polynucleotide substrates (Arrand et al., 

15 1986; Tomkinsoh et ai., 1991; Robins and Lindahl, 1996). 

Furthermore, purified mammalian DNA ligases differ in their 
abilities to rejoin DNA DSBs . It is noteworthy, however, 
that in contrast with the available in vivo data, these 
studies show that purified ligase I but no other mammalian 

20 DNA ligase is able to catalyse the joining of blunt DNA ends 
effectively in vitro (Arrand et ai., 1986; Tomkinson et ai., 
1991; Tomkinson et ai., 1992; Robins and Lindahl, 1996). One 
possible explanation for this discrepancy between the in 
vitro and in vivo data is that at least some of the 

25 eukaryotic DNA ligases may not have high intrinsic affinity 
for DNA and, within the cell, are targeted to appropriate 
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DNA lesions by accessory factors. Consistent with this is 
the identification herein of strong interaction between DNA 
ligase IV and XRCC4 . 

Inactivation of either yeast Ku or Lig4p both result in 
5 a similar dramatic reduction of NHEJ in the in vivo plasmid 
DNA DSB repair assay. Because of this and since the level 
of DNA repair does not fall further in yeast strains 
defective in both Ku and Lig4p, we conclude that these two 
factors function in the same illegitimate recombination 
10 pathway. However, it is apparent that Ku and Lig4p have 
distinct functions in DNA NHEJ, as evidenced by the 
different' spectra of residual plasmid repair products that 
are generated in the respective mutant strains. Thus, 
whereas nearly all th.e -residual plasmid repair products . 
15 arising in ykulO mutants suffer deletions, in lig4 mutant 
strains these correspond to a mixture of deletion products 
and products generated by accurate DNA end-joining. 
Collectively, these results suggest that Ku may function in 
at least two ways to potentiate DNA repair. Firstly, it may 
20 protect exposed DNA ends from nuclease attack. Secondly, it 
might serve to specifically recruit Lig4p, directly or 
indirectly, to the sites of DNA damage, perhaps via the 
Lig4p C-terminal extension that is absent from DNA ligase I. 
Consequently, the phenotypes of strains defective in Ku or 
25 Lig4p can both be explained to result from an inability to 
target a ligase to DNA DSBs efficiently. In Ku deficient 
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strains, the ready access of nucleases to the DNA ends may 
lead to deletions in virtually all the residual NHEJ repair 
products, which presumably arise via inefficient DNA end 
joining by untargeted ligase I or Lig4p. In contrast, when 
5 Lig4p is absent, Ku is still able to protect the DNA ends 
and this can explain how some accurate repair can still 
occur - this presumably being mediated by DNA ligase I. 
However, the reduced repair kinetics in lig4 mutant yeast 
may mean that, even in the presence of Ku, nucleases 
10 ultimately gain access to the DNA termini and lead deletions 
in a large proportion of the residual repair products. 
Consistent with the above model, we find that virtually all 
of the residual NHEJ products generated in ykulO/ lig4 double 
mutants have sustained terminal deletions. 

15 

Interaction between XRCC4 and Ku/DMA-PKcs complex 

XRCC4 interaction with DNA-PKcs/Ku was demonstrated by 

incubation of HeLa cell nuclear extract with anti-XRCC4 or 

pre-immune antiserum with purification of the resulting 
20 immunocomplexes by adsoprtion onto protein A-Sepharose, then 

analysis by Western immunoblotting . 

Both DNA-PKcs and the two subunits of Ku are 

immunoprecipitated by the anti-XRCC4 antiserum but not the 

pre-immune serum. 
25 In these studies, immunocomplexes were washed under 

relatively mild conditions of 0.25 M NaCl and 0.1% Nonidet- 
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940, However, when more stringent washes were employed (for 
example, in the presence of 1 M NaCl, 0.1% Nonidet-P40 and 
50 pg/ml ethidium bromide) the interaction between XRCC4 and 
Ku/DNA-PKcs complex was abolished. 
5 Taken together, these data reveal that although the 

interaction between Ku/DNA-PKcs and XRCC4 appears specific, 
it Is relatively weak. 

The interaction may be inhibited using appropriate 
agents including peptide fragments of the respective 
10 proteins. Such agents may be identified and obtained using 
assay methods and used "in therapeutic and other contexts, as 
disclosed above. 
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CCX^GAAGTGGGGCTGCCTCTTTAAATAACAAAAATCT^ 

1 + + + + + + 60 

GGCCTTCACCCCGACGGAGAAATTTATTGTTT^ 

c GSGAASLNNKNLRY*EMERK- 



c 



AAATAAGCAGAATCCACCTTGTTTCroAAC^^ 

61 + + + + + + 120 

TTTATTCGTCITAGGTGGAACAAAGACTTGGC 

ISRIHLVSEPSITHFLQVS.W- 

GGGAGAAAACACrrGGAATCTGGTTTTGTTATTA^ 

121 + + + + + 180 

CCXrrcTTTTGTGACCTTAGACaW^CA^ 

c E K T Jj E S G F V I T L T D G K 3 A W T - 

CTCGGACAGTTTCTGAATCAGAGATTTCCCAAGAAC^^ 

181 + + + — — + + + 240 

GACCCTGTCAAAGACTTAGTCTCTAAAGGGTTCT^^ 

c GTVSESEISQEADDMAMEKG- 

GGAAATATGTIGGTGAACTXSAGAAAAGCATTGTTGT^ 

241 + + + + + + 300 

CCTTTATACAACCACTTGACTCTTTTCGTAAC/^ 

c KYVGEL RKALLSGAGPADVY- 

ACACGTTTAATTTTTCTAAAGAGTCrTGTT 
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TGTGCAAATTAAAAAGATTTCTCAGAACAATAAAGAAGAAACT^^ 

^ TFNFSKESCYFFFE KNLKDV- 

TCTCATTCAGACTTGGTIXXriTCAACCTAG^ 

361 + + + + + + 420 
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c SFRLGSFNLEKVENPAEVIR- 

GAGAACTTATITCTTATTGCTTGGACACCATTC^ 

421 + + + + + + 480 

CTCTTGAATAAACAATAACGAACCTGTGGTAACGTC^^ 

c ELICYCLDTIAENQAKNEHIi- 

TGCAGAAAGAAAATGAAAGGCTTCTGAGAGATTGGAATGATGT^ 
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ACGTCTTTCTTTTACITTCCXS^ 

c QKENERLLRDWNDVQGRFEK- 

AATGTGTGAGTCCTAAGGAAGCTTTGGAGACTGATCT^ 

541 + + + + + + 600 

TTAGACACTCACGATTCCTKXSAAACCTC^^ 

c CVSAKEALETDLYKRFILVL- 
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TGAATXjAGAAGAAAACAAAAATCAGAAGTTTGCATAATAAATTAT^ 

601 + + + + + + 660 

ACITACTCTTCTTTTGTTTTTAGTCT^ 

c NEKKTKIRSLHNKLLNAAQE- 

AACGAGAAAAGGACATCAAACAAGAAGGGGAAACTGCAATCrGTT^^ 

661 + + + + + + 720 

TTGCTCTTTTCCTCTAGTTTGTTCr^ 

c REKDIKQEGETAICSEMTAD- 

ACCXSAGATCXIAGTCTATGATGAGAGTACTGATGAGG^ 

721 + + + + + + 780 

TCGCTCTAGGTCAGATACTACTCTCATGACTACI^^ 

c RDPVYDESTDEESENQTDLS- 

CTCGGOTXXXriTXrAGCTGCTGTAAGTAAAGA^^ 

781 + + + + + + 840 

GACCCAACCGAAGTCGACGACATTCATTTCTACT 

^ GLASAAVSKDDSIISSLDVT- 

CTCATATIXX^ACCAAGTAGAAAAAGGAGACAGCGAATGC^^ 
841 + + + + + 900 

gactataacgtggttcatctttttat^^ 
c diapsrkrrqrmqrnlgtep- 
ctaaaatcgctcctcaggagaatcagcttcaaga;^^ 

901 + + + + + + 960 

GATTTTACCGAGGAGTCGriCITAGTCGAAGT^^ 

c KMAPQENQLQEKEKPDSSLP- 

CTGAGACGTCGAAAAAGGAGCACATCTCAGCTGAAAAC^ 

961 + + + + + + 1020 

GACTCTGCAGCTTTTTCCTCGTGTAGAGTCGACTT^^ 

c ETSKKEHISAENMSLETLRN- 

ACAGCAGCCCAGAAGACXnXriTTCATGAGAT^ 
1021 + + + + + + 1080 

c SSPEDLFDEI*QSQKIL*CS- 

CACTAGACTATGTTTrcTATTCATTTCT^ 

1081 + + + + + + 1140 

GTGATCTGATACAAAAGATAAGTAAAGAAATTTTACrrTT^^ 

c LDYVFYSFL*NEKGEFQVSS- 

GCCGCTATTACCGTATCTTACAATTTAATTACATACACAGT^ 

1141 ' + + + + + + 1200 

CGGCGATAATGGCATAGAATGTTAAATTAATGTATGTGTCACTO 

c RYYRILQFN YIHSELKPLCK- 

AAATGGATTACACATGTATACMAGATACXSATTTGATGATXS^ 

1201 + + + + + + 1260 

TTTACCTAATGTGTACATATGTTTCTATGCTAAACTACTAC^ 

c MDYTCIQRYDLMMTLAH*VL- 



TAAACTATTCATTGAGCATGCCTATAATTACATAAATTGTATGAGACITT^^ 
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1261 + + + + + + 1320 

ATTOSATAAGTAAGTCGTAaSGATATTAATGTATTTAACATAC^^ 

c NYSFSMPIIT*IV*DFLLQR- 

GGACACATTTATCATATTCATTCACACATATTATATGT^ 

1321 + + + + + + 1380 

CCTGTGTAAATAGTATAAGTAAGTGTGTATAATATACACTATCGACAGGTTGTAGGAC^ 

c THLSYSFTHIICDSCPTSCL- 

txxsgaagattttgaaaacaggacaaagaaaacatcatitt;^^ 

1381 + + + + + + 1440 

ACCCTTCTAAAACTTTrGTCCTGTTTC^ 

c GRF*KQDKENIILKCLQLFL- 

TCAATAGACGTATTCAAACATATTCTGAACAT^ 

1441 + + + + + *-+ 1500 

ACTTATCTGCATAAGTTTGTATAAGACrrTGTAACTAC^^ 

NRRIQTY SEH*CLNILICVM- 

TGATGTAGAAAATATAATTTTAGTTTGTACATAAACAT^ 

1501 + + + + ^ + 1560 

ACTACATCITTTATATTAAAATCAAACATGTAT^ 

c M*KI*F*FVHKHCENLIIKF- 

TTTTGATACATTGAAAAAAAAA 

1561 + +~ 1582 

AAAACTATGTAACTTrTTTTTT 

c LIH*KK 
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CCACAGCGCTGTAGACTGCGCCGCATTAGAAGCCTOGC^^ 

1 + + + + + + 60 

GGTGTCGCGACATCTGACGCGGCGTAATCTTCGGACC^^ 

a PQRCRLRRIRSLAS*CCALH 

CTAGACCCAAGCCCCAGGTCX3TGGGACX3ATTTCTCCCC^^ 

61 + + + + + + 120 

GATCTGGGTTCGGGGTCCAGCACCCTGCTAAAGAGGGCAA^^ 

a LDPSPRSWDDFSRF*LPGTV, ^- 

TTGCCTGCTTTACXTGCGTACATGTTGATT^ 

121 + + + + + 180 

AACGGACGAAATGGACGCATGTACAACTAAGAAAGAGTACCXSTI^^ 

a LPALPAYMLILSHGNPAGNH 

181 + + + + + + 240 

GTTCTAGAGTAAAATGTCGACCCTAAGAGACCAAGTGTCTCCAl^^ 

a QDLILQLGFSGSQR*RSLPE 

GCCAGTTAAACGAGAAGATTCATCACCGCTTTGAT^ 

241 + + + + + + 300 

CGGTCJATTTGCTCTTCTAAGTAGTGGCGA^ 

a AS*TRRFITALMAASQTSQT 

GTTGCATCTCACGTTCCTTTTGCAGATTTC 

301 + . + + + + + 360 

CAACGTAGAGTGCAAQGAAAACGTCTAAACAGAAGTTGAAATCTTGC^ 



VASHVPFADLCSTLERIQKS 

AAAGGACGTGCAGAAAAAATCAGACACTTCAGGGAATrTTTA^ 

361 + + + + + . + 420 

TTTanXSCAanCTTTTTTAGTCTGTGAAGT^^ 



a KGRAEKIRHFREFLDSWRKF 

CATGATGCTCTTCATAAGAACCACAAAGATGTCACAGAC^^ 

421 + + + + + + 480 

GTACTACGAGAAGTATTCTTGGTGTTTCrrACAChxS^^ 

a HDALHKNHKDVTDSFYPA MR 

CTAATTCTTCCTCAGCTAGAAAGAGAGAGAATGGCCTATGGAATT 

481 + H + + + + 540 

GATTAAGAAGGAGTCGATCTTTCTCTCTCOT 

a LILPQLERERMAYGIKETML 

GCTAAGCTTTATATTGAGTTGCTTAATTTACCTAGAGATGC^^ 

CXSATTCGAAATATAACTCAACGAATTAAATGGATCr^ 
a AKLYIELLNLPRDGKDALKL 
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TTAAACTACAGAACACCCACTGGAACnCATGGAGATGCTGGAGACTT^ 

601 + + + + + + 660 

AATTTGATGTCTTGTGGGTGACCTTGAGTACCT^ 

a LNYRTPTGTHGDAGDFAMIA 

TATTTTGTGTTGAAGCCAAGATGTTTACT^GAAAGGAAGT^ 

661 + + + + + + 720 

ATAAAACACAACTTCGGTTCTACAAATGTCTTTCei^^ 

a YFVLKPRCLQKGSLTIQQVN 

GACCTTTTAGACTGAATTGCGAGCAATAATTCTGCTAAAAG^ 

721 + + + + + + 780 

CTGGAAAATCTGAGTTAACX3GTCX3TTATTAAGAa3AT^^ 

a DLLDSIASNNSAKRKDLIKK 

AGCCTTCTTCAACTTATAACTCAGAGTTCAGCA 

781 + + + + + + 840 

TCGGAAGAAGTTGAATATTCAQTCTCAAGTaSTGAAC^^ 

Ki SLLQLITQSSALEQKWLIRM 

ATCATAAAGGATTTAAAGCTTGGTGTTAGTCAGCAAAC^^ 

841 + + + + + + 900 

TAGTATTTCCTAAATTTCGAACCACAATCTVGTCGT^^ 

a IIKDLKLGVSQQTIFSVFHN 

GATGCTGCrGAGTTGCATAATGTCACTACAGATCT^^ 

901 + + + + + + 960 

CTACGACGACTCAACGTATTACAGTGATCTCTAGAC^^ 

a DAAELHNVTTDLEKVCRQLH 

GATCCTTCTGTAGGACTCAGTGATATTTCrATCACT^ 

961 + + + + + + 1020 

CTAGGAAGACATCCTGAGTCACTATAAAGATAGTGAAATAAAAGACGTAGTT^^ 

a DPSVGLSDISITLFSASKPM 

CTAGCTGCTATTGmGATATTGAGCACATTGAGAAGGATATGAi^ 

1021 + + + + + + 1080 

GATCGACGATAACGTCTATAACTCGTGTAACICTT^^^ 

a LAAIADIEHIEKDMKHQSFY 

ATAGAAACCAAGCTAGATGGTGAACGTATGCAAATGCACAAAGATGGAGATGT^ 

1081 + + + + + + 1140 

TATCTTTGGTTOSATCTACCACITGCAT^ 

a lETKLDGERMQMHKDGDVYK 

TACTTCTCTCGAAATGGATATAACTACACTGATCAGT^^ 

1141 + + + + + + 1200 

ATGAAGAGAGCTTTACCTATATTGAlXnXSACTAGTCAAAC^ 

a YFSRNGYNYT DQFGASPTEG 

TCnCTTACCCCATTCATTCATAATGCATTCA;^ 

1201 + + + + + + 1260 

AGAGAATGGGGTAAGTAAGTATTACGTAAGTTTCGTCTATATGTTTAGA 

a SLTPFIHNAFKADIQICILD 

GGTGAGATGATGGCCTATAATCCTAATACACAAACTTTCATCC^^ 
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.261 + + + + + + 1320 

CCACTCTACTACCGGATATTAGGATTATGTGTTTGAAAGTACGTTTTC 

*a GEMMAYNPNTQTFMQKGTKF 

GATATTAAAAGAATGGTAGAGGATTCTGATCTGCAAACTTC 

1321 + + + + + + 1380 

CTATAATTTTCTTACCATCTCCTAAGACTAGACGTTTGAAC^ 

a DIKRMVEDSDLQTCYCVFDV 

TTGATGGTTAATAATAAAAAGCTAGGGCATGAGACrcTGA^^ 

1381 + + 4- + + + 1440 

AACrACGAATTATTATTTTTaSATCXrCGTACTC^^ 

a LMVNNKKLGHETLRKRYEIL 

AGTAGTATTTTTACACXIAATrcCAGGTAGAATAGAAATAGTGCA 

1441 + + + + + -+ 1500 

TCATCATAAAAATGTGGTTAAGGTCXATCrTATCT^ 

SSIFTPIPGRXEIVQKTQAH 

ACTAAGAATCAAGTAATTGATCCATTGAATGAAGCAATAG-^^^ 
1501 + + + + + + 1560 

tgattcttacttcattaactacgtaacttacotasttat^ 
a tknevidalneaidkreegi 
atggtaaaacaacctctatcx:atctacaagcc:agacaaaagaggi^^ 

1561 + + + + + + 1620 

taccattttgttggagataggtagatgttcggtctgtti^^ 
a mvkqplsiykpdkrgegwlk 
attaaaccagagtatgtcagtggactaatggatgaattgg^ 

1621 + + + + + + 1680 

TAATTTGGTCTCATACAGTCACXriGAT^^ 

a IKPEYVSGLMDELDILIVGG 

TATTGGGGTAAAGGATCACGGGGTGGAATGATGTCTCAT^ 

1681 + + + + — — + + 1740 

ATAACCCCATITCCTAGTGCCCCACCITAC^^ 

a YWGKGSRGGMMSHFLCAVAE 

AAGCCCCCTXXTCGTGAGAAGCCATCTGTGT^^ 

1741 + + + + + + 1800 

TTCGGGGGAGGACCACTCTTCGGTAGACACAAAGTATGAGAGAC^ 

a KPPPGEKPSVFHTLSRVGSG 

TGCACCATGAAAGAACTGTATGATCTGGGTTTGAAATIGG^ 

1801 + + + + + + 1860 

ACGTGGTACITTCTTGACATACrAGACCCAAAC^ 

a CTMKELYDLGLKLAKYWKPF 

catagaaaagctccaccaagcagcattttatgtggaacagag;^ 

1861 + + + + + + 1920 

gtatcttttcgaggtggttcgtcgtaaaatacacc^^ 
a hrkappssilcgtekpevyi 

GAACCTTGTAATTCTGTCATTGTTCAGAaTA^^ 
1921 + + + + + + 1980 
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CTTGGAACATTAAGACAGTAACAAGTCTAATTTCGTCGTCT^ 

a EPCNSVIVQIKAAEIVPSDM 

TATAAAACTGGCTGCACCITGCGTTTTCCACC^^ 

1981 ^ + + + + + + 2040 

ATATTTTCACCXSACGTGGAACGCAAAAGGTGCTTAACT^^ 

a YKTGCTLRFPRIEKIRDDKE 

TGGCATGAGTGCATGACCCroGAaSACCTAGAAC^ 

2041 + + + + + + 2100 

ACCGTACrcACGTACTGGGACCTGCTGGATCrr^ 

a WHECMTLDDLEQLRGKASGK 

CTCGCATCTAAACACCTTTATATAGGTGGTGATGAT^ 

2101 + + + + + + 2160 

GAGCGTAGATTTGTGGAAATATATCCACCACTACTACTT^ 

a LASK HLYIGGDDEPQEKKRK 

GCIXX:CX;CAAAGATGAAGAAAGTTATTGGAATTATTGAGCAC^ 

2161 + + + + + + 2220 

CGACXXSGGTTTCTACTTCTTTCAAT^ 

a AAPKMKKVIGIIEHLKAPNL 

ACTAACGTTAACAAAATTTCTAATATATTTGAAGAT^ 

2221 + + + + + + 2280 

TGATTGCAATTGTTTTAAAGATTATATAAACIT^ 

a TNVNKISNIFEDVEFCVMSG 

ACAGATAGCCAGCCAAAGCCTGACCTGGAGAACAGAATTGCAC^ 
2281 + + + + + + 2340 

tgtctatcggtcx^gttiosgactggacctct^ 
a tdsqpkpdlenriaefggyi 
gtacaaaatccaggcccagacacgtactctgtaatixx:a(^^ 

2341 + + + + + + 2400 

catgttttaggtccgggtctgtgcatgacacattaa 
a vqnpgpdtycviagsenirv 

AAAAACATAATTTTGTCAAATAAACATGATGTTGTCAAC^^ 

2401 + + + + + + 2460 

TTTTTGTATTAAAACAGTITATTTGTACTACAACAGTTCXX^ 

a KNIILSNKHDVVKPAWLLEC 

TTTAAGACCAAAAGCITTGTACCATGGCAGCCTCGCT 

2461 + + + + + + 2520 

AAATTCTGGTTTTCGAAACATGGTACaSTCGGAGCGAAATACT 

a FKTKSFVPWQPRFMIHMCPS 

ACCAAAGAACATTTTGCCCGTGAATATGATTGCTAT^ 

2521 + + + + + + 2580 

TGGTTTCTTGTAAAACX5GGCACTrATACTAACGATACCACrATCAATAAA 

a TKEHFAREYDCYGDSYFIDT 

GACTTGAACCAACTGAAGGAAGTATTCTCAGGAATTAAAAATT^ 

2581 + + + + + + 2640 

CTGAACTTGGTTGACTTCCTrCATAAGAGTCCTTAATTT^ 
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a DLNQLKEVFSGIKNSNEQTP 

GAAGAAATGGCTTCTCTGATTGCrcATT^ 

2641 + + + + + + 2700 

CTTCirrACCGAAGAGACTAACGACTA^ 

a EEMASLIADLEYRYSWDCSP 

CI^GTATGTTTCGACGCCACACCGTTTATTTGG^ 

2701 + + + + + + 2760 

GAGTCATACAAAGCTGCGGTGTGGCAAATAAACCTG^ 

a LSMFRRHTVYLDSYAVINDL 

AGTACCAAAAATGAGGGGACAAGGTTAGCTATTAAAGCCTTGGAGCTT^ 

2761 + + + + + + 2820 

TCATGGTTTTTACTCCXXriOTTCCAATCGAT^ 

a STKNEGTRLAIKALELRFHG 

GGAAAAGTAGTTTCITGTTTAGCTGAGGGAGTC 

2821 + + + + + + 2880 

CGTTTTCATCAAAGAACAAATCGACTCCCTCACA^ 

a AKVVSCLAEGVSHVIIGEDH 

AGTCXSTCTTGCAGATTTTAAAGCTTTTAGAAGAACTT^ 
2881 + + + + + + 2940 

a SRVADFKAFRRTFKRKFKIL 

AAAGAAAGTTGGGTAACTGATTCAATAGACT^GTGTGAATT^ 

2941 + + + + + + 3000 

TTTCTTTCAACCCATTGACTAAGTTAT^^ 

a KESWVTDSIDKCELQEENQY 

TTGATTTAAAGCTAGGTTTCCTAGTGAGGAAAGCCI^^ 

3001 + + + + + + 3060 

AACTAAATTTCGATCCAAAGGATCACTCCTTTCG^ 

\ LI*S*VS**GKPLIWQTHCS - ' 

AGGTGGTAATGATAAAATACTAAACTACATITTATTTTTGTAT^ 

3061 + + + + + + 3120 

TCCACCATTACTATTTTATGATTTGATGTAAAATAAAAACATAGAAT^ 

a RW***NTKLHFIFVS*KSMP 

AAAAAGTATCATTACATATAGGAAAACAATAATTTTAACI^^ 

3121 + + + + -+ + 3180 

TTTTTCATAGTAATGTATATCCTTTTGTrATTAAAAT^ 

a KKYHYI*ENNNFNF*G*KDN 

AGCCCAAAGCCAAGAAAGAAAAATTATCTTGAATGTAGTATT^ 

3181 + +— + + + + 3240 

TCGGGTTTCGGTTCTTTCriTITTAATAG 

a SPKPRKKNYLECSIQ*FFMI 

AAGGTGAAATAAACAGTCTAAAGAAGAGGTOTTTTTATAAT^ 

3241 + + + + + + 3300 

TTCCACTTTATTTGTCAGATTTCTTCnX^ 
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gtgattaaataggctgaaatcagtgtttagtaactacgtacx;ttgtacatgtaacat^ 

1 + + + : + + + 60 

cactaatttatccgactttagtcacaaatcattgatgcatgcaacatc 

a VIK*AEISV**LRTLYM*HC 

GATATAAATCX5TAAGATTCGCCGAGTATAGATCAATAATATCX3GTTTCAT^ 

61 + + + + + + 120 

CTATATTTAGCATTCTAAGCGGCTCATATCTAGTTATTATAGCCAAAGTAGTGAATGC^ 

a dinrkirrv*innigfityv' 

GTTTGTGCAGTACTAGAGTTAAGATCGTTTTCGATCCCTTA'^ 

121 --+ + + ^ + + + 180 

CAAACACGTCATGATCTCAATTCrrAGCAAAAGCrAGGGAAT^ 

a VCAVLELRSFSIPYFLLFSF 

TTTTTGTTATTTTTCTCTTTTTACCTTTTGTCAC 

181 — + + + + + + 240 

AAAAACAATAAAAAGAGAAAAATGGAAAAGAGTGGTATAATTTAGAAATTTGTTTAG^ 

a FLLFFSFYLLSPY*IFKQI* 

CTATGAAAAAATCCTTTAAACATATGTTAATATGTGGAAAATAAATACTAAAATAAAM 

241 + + + + + + 300 

GATACTTTTTTAGGAAATITGTATACAATTATACACCTTTTATTTATGATr^ 

a L.*KNPLNIC*YVENKY*NKN 

CTAGAACTGAAGGAAATAGTAACGGATTATTTAGGTATGATATCAGCACTAGATTCT 

301 + -+ + + + + 360 

GATCTTGACTTCCTTTATCATIGCCTAATAAATCCATACTATAGTra 

LELKEIVTDYLGMISALDSI 

CC03AGCXX:CAAAACTTTGCGCCTAGTCCAGATIT^ 

3 61 + + + + + + 420 

GGGCTCGGGGTTTTGAAACGCGGATCAGGTCTAAAGTTTACCGAAACAC^^ 

a PEPQNFAPSPDFKWLCEELF 

GTGAAGATACATGAAGTTCAAATTAATGGAACGGCCGGCACTGGCAAA 

421 + + + — + — + + 480 

GACTTCTATGTACrrcAAGTTTAATTACCTTGCCGGCC^ 

a VKIHEVQINGTAGTGKSRSF 

AAGTACTATGAAATAATATCGAATTTCGTCXSAAATGTGGAGAAAAACCGT^ 

481 + : + + + + + 540 

TTCATGATACTTTATTATAGCTTAAAGCAGCTTTACACCTCTT^^ 

a KYYEIISNFVEMWRKTVGNN 

ATATATCCTGCACTGGTTCTTGCTCTTCCCTACCGCGATAGACGAATCT 
541 + + + + + + 600 

TATATAGGACGTGACCAAGAACGAGAAGGGATGGCGCTATCTGCTTAGATATTATAA 
a lYPALVLALPYRDRRIYNIK 
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GATTATGTATTAATAAGAACTATATGCTCITACTTGAAGT^ 

601 + — + + + + + 660 

CTAATACATAATTATTCTTGATATACGAGAATGAACTT^^ 

DYVLIRTICSYLKLPKNSAT 

gagcag<:x;gttaaaagattcgaaacagcgtgtcggtaaaggtg(^^ 

661 + + + +-- + + 720 

CTCGTCGCCAATTTTCTAACCTTTGT^ 

EQRLKDWKQRVGKGGNI.SSL 

CTTGTGGAAGAAATTGCTAAAAGAAGGGCTGT^COT^ 

721 + + + + + + 780 

GAACACCTTCTTTAACXSATTTTCTTCCCC^ 

LVEE lAKRRAEPS SKAIT ID^- 

AACGTCAATCACTATCTGGATAGTTTGAGTGGAGACA^^ 

781 + +-T + + + + 840 

TTGCAGTTAGTGATAGACCTATCAAACn^CCTCTC 

InVNKx'LiDSIjSGDRFASGRGF — 

AAGAGTCTTGTCAAGTCCAAACCTTTCCTG^ 

841 + H + + + + 900 

TTCTCAGAACAGTTCAGGTTTGGAAAGGACXSTGACACA 

KSLVKSKPFLHCVENMSFVE 

TTAAAATACrTCTTTGATATOSTGCTTA^^^ 

901 + + + + + — + 960 

AATTTTATGAAGAAACTATAGCACXSAATTTTTAT^^ 

LKYFFDIVLKNRVIGGQEHK 

TTGC^AAACTGCTGGCATCCTOATGCTCAGGATTA^ 

961 + + + + + + 1020 

AACGATTTGACGACXXn'AGGACTACGAGTCCTAATAGAATCGCACT 

LLNCWHPDAQDYLSVISDLK 

1021 + + + + + -. + 1080 

CACCATTGAAGTTTTGAAATACTAGGTTTTCAAGC^ 

VVTSKLYDPKVRLKDDDLSI 

AAAGTTGGCTTTGCATIXXCCCCCXlAATTAGCrJ^^ 

1081 + + + + + + 1140 

TTTCAACCGAAACGTAAGCGGGGGGTrAATCGGTTTT^ 

KVGFAFAPQLAKKVNLSYEK 

ATATGCCGTACACTACATGATGATTTTTTGGTAGAAGAAAAAATG^ 

1141 + + + + + + 1200 

TATACGGCATGTGATGTACTACrrAAAAAACXIATCTTCIT^ 

ICRTLHDDFLVEEKMDGERI 

CAAGTTCATTATATGAATTATGGTGAATCCATAAAATTTT^ 

1201 + + + + + + 1260 

GTTCAAGTAATATACTTAATACCACTTAGGTATTITAAAAAATCATC^^ 

QVHYMNYGESIKFFSRRGID 
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TATACCTATTTGTACGGAGCGAGCTTATCATCAGGAACTATAT^^ 

1261 + + + + + + 1320 

ATATGGATAAACATGCCTCGCTCGAATAGTAGTCCTTGATATAGAGT^ 

a YTYLYGASL.SSGTISQHLRF 

. ACAGATAGTGTTAAAGAATGTGTTTTAGATGGAGAAATGGTGACGTTT^ 

1321 + + + + + + 1380 

TGTCTATCACAATTTCTTACACAAAATC^^ 

a TDSVKECVLDGEMVTFDAKR 

1381 + + + + + + 1440 

a RVILPFGLVKGSAKEALSFN 

AGTATAAATAATGTTGACTTTCACCCXriTATATATGGTGTTT^ 
1441 + + , + + + + 1500 

tcatatttattacaactgaaagtggggaatatataccacaaactaga 
sinnvdf. hplymvfdllyln 
gggacitcgttgacacx:attacxx:citc^ 

1501 + + + + + + 1560 

CCCTGAAGCAACTGTGGTAATGGGGAAGTAGTTTCCTTCGTTA^^ 

a GTSLTPL PLHQR KQYLNSIL 

AGTCCCrrTGAAAAATATTGTAGAAATAGTACGATCTT^ 

1561 + + + + + + 1620 

TCAGGGAACTTTTTATAACATCTTTATCATGCT 

a SPLKNIVE IVRSSRCYGVES 

ATCAAAAAGTCITTAGAAGTTGCAATCTCACl^^ 

1621 + + — -7 + + — + + 1680 

TAGTITTrcAGAAATCTTCAAaSTTAGAGTGACCC^ 

a IKKSLEVAISLGSEGVVLKY 

TATAATTCAAGTTATAATGTCGCXIIAGTCGAAACAAC^ 

1681 +~ + + + + + 1740 

ATATTAAGTTCAATAITACAGCGGTCAGCTTTGTTC 

a YNSSYNVASRNNNWIKVKPE 

TATTTGGAGGAATTTGGAGAGAATTTAGACTTAATAGT^ 
1741 + + + + +^ + 1300 

a YLEEFGENLDLIVIGRDSGK 

AAAGATTCTTTTATGCTAGGGTTACTTGTGCTAGAT^^ 

1801 + + + + + + I860 

TTTCTAAGAAAATACGATCCCAATGAACACGATCTACTTCa^^ 

a KDSFMLGLLVLDEEEYKKHQ 

GGAGACTCXiriXnxSAAATTGTAGACCACTC^ 

1861 + + + + + + 1920 

CCTCTGAGGAGACTTTAACATCTGGTGAGTT^ 

a GDSSEIVDHSSQEKHIQNSR 
AGAAGGGTGAAAAAAATACTTTCAT 
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1921 + + + + . + + 1980 

TCTTCCCACTTTTTTTATGAAAGTAAGAC^ 

a RRVKKILSFCSIANGISQEE 

TTCAAAGAAATCGACCGCAAAACXSAGAGGACATTGGAAAAGAACC^ 

1981 + + + + + 2040 

AAGTTTCTTTAGCTGGCXnTTTGCTCTC^ 

a FKEIDRKTRGHWKRTSEVAP 

CCTGCTTCAATTTTAGAATTTCGCTCAAAAATA 

2041 + + + + + + 2100 

GGACXSAAGTTAAAATCTTAAACCGAGTTTTTATGGAC^^ 

a PASILEFGSKIPAEWIDPSE 

TCAATTGTTCTAGAAATAAAATCACGGTCr^^ 

2101 ^ + + + + 2160 

AGTTAACAAGATCITTATTTTAGTOXAGAAACCTATO 

^ SIVLEIKSRSLDNTETNMQK 

2161 1 H V H v 1- 2220 

ATGCGATGGTTAACATGAAACATGCCACCGATAACATTTTCTO 

a YATNCTLYGGYCKRIRYDKE 

TGGACAGATTGTTACACACTTAAaSACTTATACGAAAGTAGG^ 

2221 + + + + + + 2280 

ACCTGTCTAACAATGTGTGAATTGCTGAATATGCTTT^ 

a WTD-CYTLNDLYESRTVKSNP 

AGCTATCAAGCGGAAAGGTCACAGCTTGGATTGATAa^^ 

2281 + + + + + + 2340 

TCGATAGTTCGCCTTTCCAGTGTCGAACXn'AACTAT^ 

a SYQAERSQLGLIRKKRKRVL 

ATTTCAGACAGCTTTCACCAAAACAGGAAACAAC^^ 

2341 + + + + + + 2400 

TAAAGTCTGTCGAAAGTGGTITTGTCCTTTGTT^ 

a ISDSFHQNRKQLPISNIFAG 

TTACTTTTTTATGTICICTCTGACTATGT^ 

2401 + + + + + + 2460 

AATGAAAAAATACAAGAGAGACTGATACAGTGCCTCCTGTGACCTTATG^ 

a LLFYVLSDYVTEDTGIRITR 

GCAGAACITGAAAAAACTATTGTGGAACATGGTGGTAAAC^ 

2461 + + + + + — + 2520 

CGTCTTGAACTTTTTTGATAACACXrrTG^ 

a AELEKTIVEHGGKLIYNVIL 

AAACGTCATTCAATTGGGGACXnnXXSGTTAATCAGCTC 

^^^^ ZZZZ 2580 

TTTGCAGTAAGTTAACCCCTGCAAGCCAATTAGTCGACATT^ 

a KRHSIGDVRLISCKTTTECK 

GCTTTAATAGATCGAGGATATGATATATTGCACCCAAATTGGGTACT^ 
2581 + + + ^ + ^ 2.640 
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CGAAATTATCTAGCTCCTATACTATATAACXSTGGGTTTAACCCAT^ 
• a ALIDRGYDILHPNWVLDCIA 

TATAAGAGGCTCATCCIUATCGAGCCCAATTATTGCTTTAA^ 

2641 + + + + + + 2700 

ATATTCTCCGAGTAGGACTAGCTCGGGTTAATAACGAAATTGCAGA^ 

a YKRLILIEPNYCFNVSQKMR 

GCCGTCGCTGAAAAAAGGGTAGATTGTTTGGGTGATAGTTTO 

2701 ■ — + + + + + 2760 

CGGCAGCGACTTTTTTCCCATCTAACAAACrCACTATC^^ 

a AVAEKRVDCLGDSFENDISE 

ACCAAACTGTCATCATTGTATAAATCACAACTAAGTCTACCACC^ 

2761 + + + + + -+ 2820 

TGGTTTGACAGTAGTAACATATTTAGTGTTGATTC^ 

a TKLSSLYKSQLSLPPMGELE 

ATAGATTCTGAGGTTCGGCGGTTTCCATTATTTT^ 

2821 + + + + + + 2880 

TATCTAAGACTCCAAGCCGCCAAAGGTAATAAAAATAAGAGGTIGTCCT 

a IDSEVRRFPLFLFSNRIAYV 

CCACGTCGCAAAATTAGCACAGAAGATGACATTATAGAAATGAAAATTAAGTl^ 

2881 + + + + + + 2940 

GGTGCAGCGTTTTAATCGTGTCTTCT^ 

a PRRKISTEDDIIEMKIKLFG 

GGAAAAATAACAGATCAACAGTCACITTGTAACTTAATAATTATACX^ 

^^^•^ + + + + + + 3000 

CXriTTTTATTGTCTAGTTGTCAGTGAAACAT^ 

a GKXTDQQSLCNLIIIPYTDP 

ATTTTGAGGAAAGACnGCATGAATGAGGTACACGAAAA 

3001 + + + . + + + 3060 

TAAAACTCCTTTCTGACGTACTTACTCCATC 

a ILRKDCMNEVHEKIKEQIKA 

TCTGATACTATACCGAAAATAGCCAGGGTCGTTGCCCC^^ 

3061 + + + + + + 3120 

AGACTATGATATGGCTTTTATCX3GTCCCAGCAAC^ 

a SDTIPKIARVVAPEWVDHSI 

AATGAAAACTGTCAAGTGCCTGAAGAAGACTTCCCCGTAGTCA^ 

^■^^^ + + + ^ + ^ 33^QQ 

TTACTTTTGACAGTTCACGGACTTCTTCTGAAGGG^ 

a NENCQVPEEnFPVVNY*WCV 

TTGCGGAGGCTTAATTTTTTGAAGTTTATTTAATACTATCCT 
3181 + + + + + + 3240 

AACGCCTCCGAATTAAAAAACTTCAAATAAATTATGATAGGATGTATAC^ 

a LRRLNFLKFI*YYPTYVH*I 

CTTCCGTAACGTTTAOCAATAAGAGTGGAAGATGCGCAATTATATTCAA^ 
3241 + + + ^ + ^ 33Q0 

GAAGGCATTGCAAATAGTTATOCTCACCTTCTACGCGTTAATATAAGT^^ 
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LP*RLSIRVEDAQLYSKDWP 

GTCAATTAACTTAAGGAAAAAAT 

3301 + +— 3323 

CAGTTAATTGAATTCCTTTTTTA 
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l^my. June 20, 1997 f^^^t^^X^Y^ ' ^ 

• ''\_SEQUENCE 1.0 

HSU47077 standard; RNA; HUM; 13506 BP. 

XX 

. AC U47C77; 
XX 

NT gl765937 
. XX 

DT 22-FEB-1997 (Rel. 51. Created) 

DT 22-FEB-1997 (Rel. 51, Last updated. Version 1) 
XX 

DE Human DNA-dependent protein )cinase catalytic subxinit (DNk-PKcs) 

DE mRNA, corrplete cda. 

XX 

KW D^EAPKcs; double- strand break; mouse SCXD product; repair; 
KW serine /threonine protein kinase; V(D)J recombination. 
XX 

OS Homo sapiens (human) 

OC EuJcaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 

OC Vertebra ta; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

XX , ' ' 

RN [1] 

RP 1-13506 

RX MEDLINE; 95401275. 

RA Hartley K.O., Cell D. , Smith G.C. , Zhang H, , Divecha N. , 
RA Connelly M.A. . Admon A., I^es-Miller S.P., Anderson C.W., 
RA Uackson S . P . ; 

RT "DNA-dependent protein Icinase catalytic subunit: a relative of 

RT phosphatidylinositol 3 -kinase and the ataxia telangiectasia gene 

RT product " ; 

RL Cell 82:849-856(1995). 

XX 

RN [2] 

RP 1-13506 

RA Connelly M.A. , Zhang H., Kieleczawa J., Anderson C.W.; 

RT "Alternate splice-site utilization in the gene for the catalytic 

RT subunit of the EfNTA- activated protein kinase, Da^-PKcs" ; 

RL Gene 175:271-273 (1996) . - ' 

XX 

RN C3) 

RP 1-13506 

RA Anderson C.W. ; - 

RT 

RL Submitted (25-JAN-1996) to the EMBL/GenBank/dEJBJ databases. 

RL Biology, Brookhaven National Laboratory, 50 Bell Avenue, Upton, NY 

RL 11973-5000, USA 

XX 

FH Key Location/Qualifiers 
FH 

FT source 1. .13506 

FT /organism="Homo sapiens" 

FT /chrQmo3ome=" 8" 

FT /raap=''8q[ll" 

FT /cell_line=''Daudi and CCRF-CEM" 

FT /clone_lib="Clontech HL- 1117a and HL-lOeSS" 

FT CDS 58. ,12441 

FT / gene= " DMA- PKc s " 

FT /codon_start=l 

FT /product='' DNA-dependent protein kinase catalytic subunit" 

FT /db_xref=-PID:gl76593a- 

FT / trans lat ion= "MAGSGAGVRCSLLRLQETLSAADRCGAALAGHQLIRGLGQECVL3 

FT SSPAVIALQTSLVFSRDFGLLVFVRKSLNSIEFRECREEILKFLCXFLEKMGQKIAPYS 

FT VEIKNTCrSVYTKDRAAKCKIPALDlilKIJJQTFRSSRLI^EFK 

FT KKIPDTV^JEKVYELLGLLGEVHPSEMINN7^ENLFRAFLGELKTQ^^^SA\^ 

FT CLKGLSSLLCNFTKSMEEDPQTSREIFNFVOAIRPQIDI^KRYAVPSAGIJU^AI^ 

FT FSTCLLDNYVSLFEVLLKWCAHTNVELKKAALSALESFLJCQVSNMVAK^^^ 

FT FMEQFYGIIRNVDS^INKELSIAIRGYGLFAGPCKVINAKDVDF^^A/ELIQRCK^ 



cdna 

r<a£vy , Juno 2 0, 1997 




FT 

r 

FT 



FT 



FT 
FT 



TDTGDYRWQMPSFI^SVASVU^YLDTVPEWTPVI^mjVVl^ 
RAJ^/KWLAIJ^GPVlJy^CISTWHQGLIRXCSKPVVI^GPESE^ 
KWKVPTYKDYVDLFRHLLSSDQMMDSIIJ^EAFFSVNSSS^ 
EKXIiLTLEXOT^/GEQEN^DEAPGVWMIPTSDPAANLHPiiJCPKDFSAF 
EKQAEFri.->-v-vSF3yHLlI^S;rJL?LI3GFYKLLSX-T^ 
PEDPEKYSCFALfVKFGKIVAVKMKQYKDELI^CLTFTJLSLPHm 
FT ^^AF^a^LSYTPIAEVGO^ALEEWSIYIDRHVM 

FT WEVSAI^RAAQKGFNKWLKHIJatTKKr^SNEAJSI^IRIRW^^ 

FT T\iT'SSDE3«MKSYVAWDREKRLSFAVPFREMKP\7IFLD\^ 

FT CEIJJHSMVMFMI^3KATQMPEXSGQGAPPt4YQLYra 

FT QLIHWF^JKKFESQDTVSLLEAILDGIVDFVDSTLRDFC^RCIR^ 

FT QEKSFV^ITKSLFKRLYStiAIj^PNAFKRLGASLAF^^^ 

FT MESIJU^JiADEKSLGTrQQCCrmiDHI^RIlEKKHVSIJ^KAI^^ 

FT DLVKWU^AHCGRPQTECIOIKSIELFYKFVPIJ^PGNRSPlsn^^ 

FT gggcgqpsgiiaqptllylrgpfsi^atu^tldllia;^ 

FT EAQSSIJLXAVaFFL.ESIA^IHDI lAAEKCFGTGAAGNRTSPQEGERYNYSKCTVWRrME 

FT FTTTIiOTSPEGWKIJ:JCKELC^^^HI^^^^ 

FT MKALKMSPYIODILETHlJ^ITAQSIEEI/CAVOT^YGPn^^ 

FT GIJ^HNILPSQSTDLHHSVGTELLSIArilCGIAPGDER<y:LPSLDLSCKQIJ^ 

FT FGGIiCERLVSU^LNPAVLSTASIiGSSQGSVIHFSHGEYFYSIJSErrim^XJ^^ 

FT LELMQSSVnNTKMVSAVL^K»lLDQSFRERANQKHQG 

FT LErn<MAVLAIJlJ^I^IDSSVSFOTSHGSFPEVFTTYISLJ^^ 

FT FFTSLTGGSLEELRRVI^QLIVAHFmQSREFPPGTPRFNNYVD^^ 

FT MLLEL^m^;LCREQQHVMEEIJ^QSSFRRIARRGSCVTQVGI^ 

FT P-QSF^-n3RSIiTLLVJKCSLIiALREFF3TTVVDAIDVIJ<SK^^ 

FT KIIJ^VMYSRLPKDDVHAKESKIl^VFHGSCITECa^ELT^ 

FT L«E3lRRLYHCAAYNCAISVlCCVFNEXKFyQGFIJ 

FT EVEVPMERKKKYIEXRKEARR^ANGDSDGPSYHSSLSYIiADSTI^EEM^ 
FT YSYSSQDPRPATGRFRRREQRDPTVHDDVtiEa^EMDEa^NRHEC^^ 
FT PQGEEDSV^RDIJSWMKFI^GKLGiaPIVPLNIRIiFLJi^ 

QLAASE^3NGGEGIHYMWErVATILSVTOLATPTGVPKDEV^ 
. AWRHNI£IIKTL\mcWKrx:LSIPYRLIFEKFSGKDPNSraNSVGIQLI^ 
FT YDPQCGIQSSEYFQALVNNMSFVRYKZ^^AAAAEVI^LII^YVt^ERK^^ 
FT KQ IJCQH Q^^MEDKFIVCtJ^KVTKSFPPLADRFM^l^WFIX 
FT EG^^ImYFQLKSKDFVQVMRHRDERQKVCLDI^^^^KM^ 
FT , STTCREQMlfNILMWXHDNYRDPESETimJSQEIFKt^^ 

FT WSHETRLPSISPITJDRIiAI2JSLYSPKIEVHFl^IATNFIX^^ 
^ CEFQEYTIiDSDWRFRSTVLTPMFVETQASQGTI/2TRTQEGSLSARWPVAGQI^ 

• DFTLTQTADGRSSPDWIiTGSSTDPLVDHTS PSSDSLLFAHKRSERLQRAPLKSVGPDFG 
KKRLGLPGDEVDNKVKGAAGRTDLIJ?IJ^RRFMRDQEI^ 

LKMKQnAQ\A/LYRSYRHGDLPDIQIKHSSLITPLOAVAQRDPI I AKQLFS SLFSGXLKE 
FT MDKFKrn^SEKraOITQKIJ^DFNRFIJ^TTFSFFPPFVSCIQD 
FT GCIA51^PVGIRIJLEEALLRLI^AELPAKRVRGKARLPPDVIJIWV^^ 

VLRGIFTSE1GTKQITQSALIJ\EARSDYSEAAKQYDEMJ^QDWVI>3EPTEAE^ 
ASIJX^HLAEWKSI^CSTASIDSENPPDU^IWSEPFYQETYLPYM^ 
FT EAJ^QSIXTFIDKAMHGELQKAI1£LHYSQELSIXYIJJQDI:^/DRA^^ 
FT SSIDVIJJHQSRLTKLQSVQALTETQEFISFISKQGNI^SQfV^ 

FT DPMNIWDDI ITNRCFFLSKIEEKLTPLPEDNSMNVDQDGDPSDRMEVQEQEEDISSLIR 

FT SCKFSMKMKMII)SARKQ^3NFSIJ^MKIiKELHKESKTRDI:^ 

ft gcseqvltvijctvsij^eisn^ssylskniuafri>3niiii/3tt^ 
ft ieedkarrilblsgsssedsi3cviaglyqrafqhlsel\vq^^ 

ij:aymtiju3fcdqqlrkeeenas\^ 

lqi lerypeetlslmtkeissvpcwqfi swishmvaijl.dki>3avavqhsv^ 

AIVYPFXISSESYSFK]>rSTGHKNKEFVARIKSKLrX3GGVIQDFXNAI^ 
nWSliJDVRAELAKTPVNKKmEKMY^ 
FT KGGSKIXRMKI^DF^^DITNMUJJKM^^aDSKPPGNIJ^ 
FT YIX3RGKPLPEYH\miAGFDERVTV;MASIJ^RPK3U:iIRGHDEREHPFLVK 
FT VEQ^^QVM^K3IXJ^QDSACSQRAIiQLRTYSWPMTSRLJGLIEWLEN^ 
FT EE30^YLSDPRAPPCEYKEIWLTKMSGKHD^A3AYMOOT 
FT ULKRAFVRMSTSPEAFLArj^HTASSHALICISHWIIXSIGDRHIJJNF^^ 

DFGHAFGSATQFLP^^ELMPFRIiTRQFIlSn^MLPMKETGLJMyS n-IVHALRAFRSDPGLLT 
mmJVFVKEPSFDWKNFEQKMUaCGGSWIQEIW 
FT ITCDEI^LGHEKAPAFRDYVAVARGSiaDH^^:RAQEPESGI^EETQVKClJ4DQA^ 
FT GRTWEGWEPWM" 
FT exon 114S1. .11544 

FT /gene="ElMA.-PKcs" 



FT 
FT 
FT 



FT 
FT 



FT 
FT 
FT 
FT 



FT 
FT 
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/note="exon occasionally removed fay improper splicing; 
alternatively spliced transcript without tiiis exon 
FT deposited in GenBank Accession Number U34994" 

4 XX 

SQ Se<TLit»j-!.a'?. 135.06 BP; 3922 A; 2 999 C; 3222 3571 T; 0 other; 



U47077 


Length: 13506 June 20, 1997 09:35 Type: N 


Check: 532 


1 


GGGGCATT^ 


CGGGTCCGGG 


CCGAGCGGGC 


GCACGCGCGG 


GAGCGGQACT 


51 


CGGCGGCATG 


GCGGGCTCCG 


GAGCCGGTGT 


GCGTTGCTCC 


CTGCTGCGGC 


101 


TGCAGGAGAC 


CTTGTCCGCT 


GCGOACCGCT 


GCGGTGCTGC 


CCTSGCOGGT 


151 


CATCAACTGA 


TCCGCGGCCT 


GGGGCAGOAA 


TGCGTCCTGA 


GCAGCAGCCC 


201 


CGCGGTGCTG 


GCATTACAfiA 


CATCTITAGT 


TTTTTCCAGA 


GATrrCGGTT 


251 


TGCTTGTATT 


TGTCCGGAAG 


TCACTCAACA 


GTATTQAATT 


TCGTGAATGT 


301 


AGAGAAGAAA 


TCCTAAAGTT 


TTTATGTATT. 


TTCTTAGATiA 


AAATGGGCCA 


351 


GAAGATCGCA 


CCTTACTCTG 


TTGAAATTAA 


G?ACACTTGT 


ACCACSTGTTT 


401 




TAGAGCTGCT 


AAATGTAAAA 


TTCCAGCCCT 


GGACCTTCTT 


451 


ATTAAGTTAC 


TTCAGACTTT 


TAGAAGTTCT 


AGACTCATGG 


ATGAATTTAA 


501 


AATTGGAGAA 


TTATTTAGTA 


AATTCTATGG 


AGAACTTGCA 


TIGAAAAAAA 


551 


AAATACCAGA 


TACAGTTTTA 


GAAAAAGTAT 


ATGAiGCTCCT 


AGGATTATTG 


601 


GGTGAAGTTC 


ATCCTAGTGA 


GATGATAAAT 


AATGCAGAAA 


ACCTGTTCCG 


651 


CGCTTTTCTG 


GGTGAACTTA 


AGACCCAGAT 


GACATCAGCA 


GTAAGAGAGC 


701 


CCAAACTACC 


TGTTCTGGCA 


GGATGTCTGA 


AGGGGTTGTC 


CTCACTICTG 


751 


TGCAACTTCA 


CTAAGTCCAT 


GGAAGAAGAT 


CCCCAGACTT 


CAAGGGAGAT 


801 


TTTT7ATTTT 


GTACTAAAGG 


CAATTCGTCC 


TCAGATTGAT 


CTGAAGAGAT 


851 


ATGCTGTGCC 


CTCAGCTGGC 


TTGCGCCTAT 


TTGCCCTGCA 


TGCATCTCAG 


901 


TTTAGCACCT 


GCCTTCIGGA 


CAACTACGTG 


TCTCTATTTG 


AAGTCTTOTr 


9S1 


AAAGTGGTGT 


C5CCCACACAA 


ATGTAGAATT 


GAAAAAA(3CT 


GCACTTTCAG 


1001 


CCCTGGAATC 


CTTTCTGAAA 


CAGGTTTCTA 


ATATGGTGGC 


GAAAAATGCA 


1051 


GAAATGCATA 


AAAATAAACT 


GCAGTACTTT 


ATGGAGCAGT 


TTTATGGAAT 


1101 


CATCAGAAAT 


GTGGATTCGA 


ACAACAAGGA 


GTTATCTATT 


GCTATCCGTG 


1151 


GATATGGACT 


TTXTGCAGGA 


CCGTGCAAGG 


TEATAAACGC 


AAAAGATGTT 


1201 


GACTTCATGT 


ACGTTGAGCT 


CATTCAGCGC 


TGCAAGCAGA 


TGTTCCTCAC 


1251 


CCAGACAGAC 


ACTGGTGACT 


ACCGTGTTTA 


TCAGATGCCA 


AGCTTCCTCC 


1301 


AGTCTGTTGC 


AAGCGTCTTG 


CTGTACCTTG 


ACACAGTTCC 


TGAGGTGTAT 


1351 


ACTCCAI3TTC 


TGGAGCACCT 


CGTGGTC3ATG 


CAGATAGACA 


GTTTCCCACA 


1401 


GTACAGTCCA 


AAAATGCAGC 


TGGTGTGTTG 


CAGAGCCATA 


GTaAAGGTGT 



1451 TCCTAGCTTT GGCAGCAAAA GGGCCAGTTC TCAGGAATTG CATTAGTACT 

1501 GTGGTGCATC AGGGTTTAAT CAGAATATGT TCTAAACCAG TGGTCCTTCC 

ct;i ? ■& t ri''^Gr'CC^ .-^a^^Tv^w^j^y/p CTC-'-ACACC?- CC't^TGCTTCA CrG-GGAAGTCA 

1601 GAACTGGCAA ATGGAAGGTG CCCACATACA AAGACTACGT GGATCTCTTC 

1651 AGACATCTCC TGAGCTCTGA CCAGATGATG GATTCTATTT TAGCAGATGA 

nOl AGCATTTTTC TCTGTGAATT CCTCCAGTGA AAGTCTGAAT CATTTACTTT 

1751 ATGATGAATT TGTAAAATCC GTnTGAAGA TTGTTGAGAA ATTGGATCTT 

1801 ACACTTGAAA TACAGACTGT TGGGGAACAA GAGAATGGAG ATGAGGCGCC 

1851 TGGTGTXTGG ATGATCCCAA CTTCAGATCC AGCGGCTAAC TTGCA.TCCAG 

1901 CTAAACCTAA Aai^TTTTTCG GCTTTCATTA .ACCTGGTGGA ATTTTGCAGA 

19 SI GAGATTCTCC CTGAGAAACA AGCAGAATTT TTTGAACCAT GGGTGTACTC 

2001 ATTTTCATAT GAATTAATTT TGCAATCTAC AAGGTTGCCC CTCATCAGTG 

2051 GTTTCTACAA A'TTGOTTrCT ATTACAGTAA GAAATGCCAA GAAAATAAAA 

2101 TATTTCGAGG GAGTTAGTCX: AAAGAGTCTG AAACACTCTC CTGAAGACCC 

2151 AGAAAAGTAT TCTTGCTTTG CTTTA1TTGT GAAATTTGGC AAAGAGGTGG 

2201 CAGTTAAAAT GAAGCAGTAC AAAGATGAAC TTTTGGCCTC TTGTTTGACC 

2251 TTTCTTCTGT CCTTGCCACA CAACATCATT GAACTCGATG TTAGAGCCTA 

2301 CCSTTCCTGCA CTGCAGATGG CTTTCAAACT GGGCCTGAGC TATACCCCCT 

'2351 TGGCAG7VAGT AGGCCTGAAT GCTCTAGAAG AATGGTCAAT TTATATTGAC 

2401 AGACATGTAA TGCAGCCTTA TTACAAAGAC ATTCTCCCCT GCCTGGATGG 

2451 ATACCTGAAG ACTTCAGCCT TGTCAGATGA GACCAAGAAT AACTGGGAAG 

2501 TGTCAGCTCT TTCTCGGGCT GCCCAGAAAG GATTTAATAA AGTGGTQTTA 

2551 AAGCATCTGA AGAAGACAAA GAACCTTTCA TCAAACGAAG CAATATCCTT 

2601 AGAAGAAATA AGAATTAGAG TAGTACAAAT GCITGGATCT CTAGGAGGAC 

2651 AAATAAACAA AAATCTTCTG ACAGTCACGT CCTCAGATGA GATGATGAAG 

2701 AGCTATGTGG CCTGGGAGAG AGAGAAGCGG CTaAGCTTTG CAGTGCCCTT 

2751 TAGAfiAGATG AAACCIGTCA TTTTCCTGGA TGTGTTCCTG CCTCGAGTCA 

2801 CAGAATTAGC GCTCACAGCC AGTGACAGAC AAACTAAAGT TGCAGCCTGT 

2851 GAACTTTTAC ATAGCATGGT TATGTTTATG TTGGGCAAAG CCACGCAGAT 

2901 GCCAGAAGGG GGACAGGGAG CCCCACCCAT GTACCAGCTC TATAAGCGGA 

2951 CGTTTCCTCT GCTGCTTCGA CTTGCGTGTG ATGTTGATCA GGTGACAAGG 

3001 CAACTGTATG AGCCACTAGT TATGCAGCTG ATTCACTGGT TCACTAACAA 

3051 CAAGAAATTT GAAAGTCAGG ATACTGTTTC CTTACTAGAA GCTATATTTGG 



3101 ATGGAATTGT GGACCCTGTT GACAGTACTT TAAGAGATTT TTGTGGTCGG 

3151 TGTATTCGAG AATTCCTTAA ATGGTCCATT AAGCAAATAA C^CCACAGCA 

?20'' GC^''V3?GAAG AGTCCAGT-^A ACJ^.CCA^-^-TC 3CTTr?TCAAG CGACTTTATA 

3251 GCCTTGCGCT TCACCCCAAT GCTTTCAAGA GGCTGGGAGC ATCACTTGCC 

3301 TTTAATAATA TCTACAGOaA ATTCAGGGAA GAAGAGTCTC TGGTGGAACA 

3351 (j' l ' l ' l ' Cj ' llj ' iU ' i ' GAAGCCTTGG TGATATACAT GGAGAGTCTG GCCTTAGCAC 

3401 ATGCAaATGA GAAGTCCTTA GGTACAATTC AACAGTG1TG TGATGCCATTT 

3451 GATCACCTAT GCCGCATCAT TGAAAAGAAG CATGTrrCTT TAAATAAAGC 

3501 AAAGAAACGA CGTTTGCCGC ai^GGATTTCC ACCTTCCGCA TCATTGTGTT 

3551 TATIGGATCT GGTCAAGTGG CTTTTAGCTC ATTGTGGGAG GCCCXIAGACA 

3601 GAATGTCGAC ACAAATCCAT TGAACTCTTT TATAAATTCG TTCCTTTATT 

3651 GCCAGGCAAC AGATCCCCTA ATTTGTGGCT GAAAGATGTT CTCAAGGAAG 

3701 AAGGTGTCTC TTITCTCATC AACACCTTTG AGGGGGGTGG CTGTGGCCAG 

3751 CCCTCGGGCA TCCTGGCCCA GCCCACCCTC TTGTACCTTC GGGGGCCATT 

3801 CAGCCTGCAG GCCACGCTAT GOTGGCTGGA CCTGCTCCTG GCCGCGTTGG 

3851 AGTGCTACAA CACGTTCATT GGCGAGAGAA CTGTAGGAGC GCTCCAGGTC 

3901 CTAGGTACTG AAGCCCAGTC TTCACrTTTG AAAGCAGTGG CTTTCTTCTT 

3951 AGAAAGCATT GCCATGCATG ACATTATAGC AGCAGAAAAG TGCnTGGCA 

4001 CTGGGGCAGC AGGTAACAGA ACAAGCCCAC AAGAGGGAGA AAGGTACAAC 

4051 TACAGGAAAT GCACCGTTGT GGTCCGGATT ATGGAGTTTA CCACaACTCT 

4101 GCTAAACACC TCCCCGGAAG GATGGAAGCT CCTGAAGAAC GACTTGTGTA 

4151 ATACACACCT GATGAGAGTC CTGGTGCAGA CGCTGTGTGA GCCCGCAAGC 

4201 ATAGGTTTCA ACATCGGAGA CGTCCAGGTT ATGGCTCATC TTCCTGATGT 

4251 TTGTGTGAAT CIGATGAAAG CTCTAAAGAT GTCCCCATAC AAAGATATCC 

4301 TAGAGACCCA TCTGAGAGAG AAAATAACAJG CACAGAGCAT 'TGAGGAGCTT 

4351 TGTGCCGTCA ACTTGTATGG CCCTGACGCG CAAGTGGACA GGAGGAGGCTT 

4401 GGCTGCTGTT GTGTCTGCCT GTAAACAGCT TCACAGAGCT GGGCTTCTGC 

4451 ATAATATATT ACCGTCTCAG TCCACAGATT TGCATCATTC TGTTGGCACA 

4501 GAACTTCTTT CCCTGGTTTA TAAAGGCATT GCCCCTGGAG ATGAGAGACA 

4551 GTGTCTGCCT TCTCTAGACC TCAGTTGTAA GCAGCTGGCC AGCGGACTTC 

4601 IGG-AGXTJ^JSC CTTTGCTTTT GGAGGACTGT GTGAGCGCCT TGTGAGTCTT 

4651 CTCCTGAACC CAGCGGTGCT GTCCACGGCG TCCTTGGGCA GCTCACAGGG 

4701 CAGCGTCATC CACTTCTCCC ATGGGGAGTA nTCTATAGC TTGTTCTCAG 
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4751 AAACGXTCAA CACGGAATTA TTGAAAAATC TGGATCTTGC TGTATTGGAG 

4801 CTCATGCAGT CTTCAGTGGA TAATACCAAA ATGGTGAGTG CCGTTTTGAA 

4 Sri CC-CCi'-.I^^TT?. G?,C'CAC=ACtCT TCAC?GQ?.GCG .?jC?d^_-ACCAG .VA.^.CACCA^.G 

4901 GACTGAAACT TGCGACTAa\ ATTCTGCAAC ACTGGAAGAA GTGTGAOTCA 

4951 TGGTGGGCCA AAGATTCCCC TCTCGAAACT AAAATGGCAG. 1GCTGGCCTT 

5001 ACTGGCAAAA ATTTTAC^GA TTGATTCATC TGTATCTTTT AATACAAGTC 

5051 ATGGTTCATT CCCTGAAGTC TTTACAACAT ATATTAGTCT ACTIGCTGAC 

5101 ACAAAGCTGG ATCTACATTT AAAGGGCCAA GCTGTCACTC .TTCTTCCATT 

5151 CTTCACCAGC CTCACTGGAG GCAGTCTGGA GGAACTTAa\ CGTGTTCTGG 

5201 AGCAGCTCAT CGTTGCTCAC TTCCCCATGC AGTCCAGGGA ATTTCCTCCA 

5251 GGAACTCCGC GGTTCAATAA TTATGTGGAC TGCATGAAAA AGTTTCTAiGA 

5301 TGCATTGGAA TTATCTCAAA GCCCTATGTT GTTGQAATTG ATGACAGAAG 

5351 TTCTTOGTCG GGAACAGCAG CATGTCATGG AAGAATTATT TCAATCCAGT 

5401 TTCAGGAGGA TTGCCAGAAG GGGTTCATGT GTCACACAAG TAGGCCTTCT 

5451 GGAAAGCGTG TATGAAATGT TCAGGAAGGA TGACCCCCGC CTAAGTTTCA 

5501 CACGCCAGTC CTTTGTGGAC CGCTCCCTCC TCACTCTGCT GTGGCACTGT 

5551 AGCCTGGATG CTTTGAGAGA ATTCTTCAGC ACAATTGTGG TGGATGCCAT 

5601 TGATGTGTTG AAGTCCAGGT TTACAAAGCT AAATGAATCT ACCTTTGATA 

5651 CTCAAATCAC CAAGAAGATC GGCTACTATA AGATTCTAGA CGTGATGTAT 

5701 TCTCGCCTTC CCAAAiSATGA TGTTCATGCT AAGGAATCAA AAATTAATCA 

5751 AGTTTTCCAT GGCTCGTGTA TTACAGAAGG AAATGAACTT ACAAAGACAT 

5801 TGATTAAATT GTGCTACGAT GCATTTACAG AGAACATGGC AGGAGAGAAT 

5851 CAGCTGCTGG AGAGGAGAAG ACTTTACCAT TGTGCAGCAT ACAACTGCGC 

5901 CATATCTGTC ATCTGCTGTG TCTTCAATGA GTTAAAATTT TACCAAGGTT 

5951 TTCIOTTTAG TCAAAAACCA GAAAAGAACT TGCTTATTTT TCAAAATCTG 

6001 ATCGACCTGA AGCGCCGCTA TAATTTTCCT GTAGAAGTTG AGGTTCCTAT 

6051 GGAAAGAAAG AAAAAGTACA TTGAAATTAG GAAAGAAGCC AGAGAAGCAG 

6101 CAAATGGGGA TTCAGATGGT CCTTCCTATA TGTCTTCCCT GTCATATTTG 

6151 GCAGACAGTA CCCTGAGTGA GGAAATGAGT CAATTTGATT TCTCAACCGG 

6201 AGTTCAGAGC TATTGATACA GCTCCCAAGA CCCTAGACCT GCCACTGGTC 

6251 GTTTTCGGAG ACGGGAGCAG CGGGACCCCA CGGTGCATGA TGATGTGCTG 

6301 GP.GCTGGAGA TGGACGAGCT CAATCGGCAT GAGTGCATGG CGCCCCTGAC 

6351 GGCCCTGGTC AAGCACATGC AaAGAAGCCT GGGCCCGCCT CAAGGAGAAG 
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6401 AGGATTCAGT GCCAAGAGAT CTTCCTTCTT GGATGAAATT CCTCCATGGC 

6451 AAACTGGGAA ATCCAATAGT ACCA.TTAAAT ATCCC3TCTCT TCTTAGCCAA 

6501 GCTT^TTA'TT ?ATAOiGAPJ3 AOC^TCTTTCG CCCTTACGCG >A.GCACTGGC 

6551 TTAGCCCCTT GCTGCAGCTG GCTGCTTCTG AAAACAATGG AGGAGAAGGA 

6601 ATICACTACA TGGTGGOTGA GATAGTGGCC ACTATTCTTT CATGGACAGG 

6651 CTTGGCCACT CCAACAGGGG TCCCTAAAGA TGAAGTGTTA GCAAATCGAT 

6701 TGCTTAATTT CCTAATGAAA CATGTCTTTC ATCCAAAAAG AGCTGTGTTT 

6751 AGACACAACC TTGAAATTA.T AAAGACCCTT GTCGAGTGCT GGAAGGATTG 

6801 TTTATCCATC CCTTATAGGT TAATATriGA AAAGTTTTCC GGTAAAGATC 

6851 CTAATTCTAA AGACAACTCA GTAGGGATTC AATTGCTAGG CATCGTGATG 

6901 GCCAATGACC TGCCTCCCTA TGACCCACAG TGTGGCATCC AGAGTAGCGA 

6951 A.TACTTCCAG GCTTTGGTGA ATAATATGTC CTTTGTAAGA TATAAAGAAG 

7001 TGTATGCCGC TGCAGCAGAA GTTCTAfiGAC TTATACTTCG ATATGTTATG 

7051 GAGAGAAAAA ACATACTGGA GGAGTCTCTG TX5TGAACTGG TTGCGAAACA 

7101 ATTGAAGCAA CATCAGAATA CTATGGAGGA CAAGTTTATT GTCTGCTTGA 

7151 ACAAAGTGAC CAAGAGCTTC CCTCCTCTTG CAGACAGGTT CATGAATGCT 

7201 GTGTTCTTTC TGCTGCCAAA ATTTCATGGA GTGTTGAAAA CACTCTGTCT 

7251 GGAGGTGGTA CTTTGTCGTG TGGAGGGAAT GACAGAGCTG TACTTCCAGT 

7301 TAAAGAGCAA GGACTTCGTT CAAGTCATGA GACATAGAGA TGAAAGACAA 

7351 AAAGTATGTT TGGACATAAT TTATAAGATG ATGCCAAAGT TAAAACCAGT 

7401 AGAACTCCXSA GAACTTCTGA ACCCCGTTGT GGAA1TCGTT TCCCATCCTT 

7451 CTACAACATG TAGGGAACAA ATGTATAATA TTCTCATOTG GATTCATGAT 

7501 AATTACAGAG ATCCAGAAAG TGAGACAGAT AATGACTCCC AGGAAATATT 

7551 TAAGTTGGCA AAAGATGTGC TGATTCAAGG ATTGATCGAT GAGAACCCTG 

7601 GACTTCAATT AATTATTCGA AATTTCTGGA GCCATGAAAC TAGGTTACeT 

7651 TCAAATACCT TGGACCGGTT GCTGGCACTA AATTCCTTAT ATTCTCCTAA 

7701 GATAGAAGTC CACTOTTTAA GTTTAGCAAC AAATTTTCTG CTCGAAATGA 

7751 CCAGCATGAG CCCAGATTAT CCAAACCCCA TGTTCGAGCA TCCTCTGTCA 

7801 GAATGCGAAT TTCAGGAATA TACCATTGAT TCTGATTGGC GTTTCCGAAG 

7851 TACTGITCTC ACTCCXSATGT TTGTGGAGAC CCAGGCCTCC CAGGGCACTC 

7901 TCCAGACCCG TACCGAGGAA GGGTCCCTCT CAGCTCGCTG GCCAGTGGCA 

7951 GGGCAGATAA GGGCCACCCA GCAGCAGCAT GACTTCACAC TGACACAGAC 

8001 TGCAGATGGA AGAAGGTCAT TTGATTGGCT GACCGGGAGC AGCACTGACC 
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8051 CGCTGGTCGA CCACACCAGT CCCTCATCTG ACTCCTTGCT GTTTGCCCAC 

8101 AAGAGGAGTG AAAGGTTACA GAGAGCACCC TTGAAGTCAG TGGGGCCTGA 

8151 TTTTC>C-CAAA AAAAGGCTC!?-':^ OCCTT^CC.-.G-G G-'X^C'!yiGG1^2 GATAAC^AAG 

8201 TGAAAGGTGC GGCCGGCCGG ACGGACCTAC TACGACTGCG CAGACGGTTT 

8251 ATGAGGGACC AGGAGAAGCT CAX3TTTQATG TATGCCAGAA AT^GGCGTTGC 

8301 TGAGCAAAAA CGAGAGAAGG AAATCAAGAG TGAGTTAAAA ATGAAGCAGG 

83 51 ATGCCCAGGT CGTTCTGTAC AGAAGCTACC GGCACGGAGA CCTTCCTGAC 

8401 ATTCAGATCA AGCACAGCAG CCTCATCACC CCGTTACAGG CCGTGGCCCA 

8451 aAGGGACCCA ATAATTGCAA AACAGCTCTT TAGCAGCTTG nTTCTGGAA 

8501 TTTTGAAAGA GATGGATAAA TTTAAGACAC TGTCTGAAAA AAACAACATC 

8551 ACTCAAAAGT TGCTTCAAGA CTTCAATCGT TTTCTTAATA CCACCTTCTC 

8601 TTTCrriTCCA CCCriTGTCT CTTGTATTCA GGACA1TAGC TGTCAGCACG 

8651 CAGCCCTGCT GAGCCTCGAC CCAGCGGCTG TTAGCGCTGG TTGCCTGGCC 

8701 AGCCTACAGC AGCCCGTGGG CATCCGCCTG CTAGAGGAGG CTCTGCTCCG 

8751 CCTGCTGCCT GCTGAGCTGC CTGCCAAGCG AGTCCGTGGG AAGGCCCGCC 

8801 TCCCTCCTGA TGTCCTCAGA TGGGTGGAGC TTGCTAAGCT GTATAGATCA 

8851 AITCGAGAAT ACGACGTCCT CCGTGGGATT TTTACCAGTG AGATAGGAAC 

8901 AAAGCAAATC ACTCAGAGTG CATTATTAGC AGAAGCCAGA AGTGATTATT 

8951 CTGAAGCTGC TAAGCAGTAT GATGAGGCTC TCAATAAACA AGACTGGGTA 

9001 GATGGTGAGC CCACAGAAGC CGAGAAGGAT TTTIGGGAAC TTGCATCCCT 

9051 TGACTGTTAC AACCACCTTG CTGAGTGGAA ATCACTTGAA TACTGTTCTA 

9101 CAGCCAGTAT AGACAGTGAG AACCCCXTCAG ACCTAAATAA AATCTGGAGT 

9151 GAACCATTTT ATCAGGAAAC ATATCTACCT TACATGATCC GGAGCAAGCT 

9201 GAAGCTGCTG CTCCAGGGAG AGGCTGACCA GTCCCTGCTG ACATTTATTG 

9251 ACAAAGCTAT GCACGGGGAG CTCCAGAAGG CGATTCTAGA GCTTCATTAC 

9301 AGTCAAGAGC TGAGTCTGCT TTACCTCCTG CAAGATGATG TTaACAOAGC 

9351 CAAATATTAC ATTCAAAATG GCATTCAGAG TTTTATGCAG AATTATTCTA 

9401 GTATTGATGT CCTCTTACAC CAAAGTAGAC TCACCAAATT GCAGTCTGTA 

9451 CAGGCTTTAA C^GAAATTCA GGAGTTCATC AGCTTTATAA GCAAACAAGG 

9501 CAATTTATCA TCTCAAGTTC CCCTTAAGAG ACTTCTGAAC ACCTGGACAA 

9551 ACAGATATCC AGATGCTAAA ATGGACCCAA TGAACATCTG GGATGACATC 

9601 ATCACA^ATC GATGTTTCTT TCTCAGCAAA ATAGAGGAGA AGCTTACCCC 

9651 TCTTCCAGAA GATAATAGTA TGAATGTGGA TCAAGATGGA GACCCCAGTG 
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9701 ACAGGATCGA AGTGCAAGAG CAGGAAGAAG ATATCAGCTC CCTCATCAGG 

9751 AGTTCCAAGT TITCCATQAA AATGAAGATG ATAGACAGTG CCCGGAAGCA 

4 

9 SOI G?iACA*-TTTC TCACTTGCTA T'2A.-J^J7T'.CT GAA'C^GAGCTG CATAAAGAGT 

9851 CAAAAACCAG AGACGATTGG CTGGTGAGCT GGGTGCAGAG CTACTGCCGC 

9901 CTGAGCCACT GCCGGAGCCG GTCCCAGGGC TGCTCTGAGC AGGTGCTCAC 

9951 TGTGCTGAAA ACAGTCTCTT TGTTGGATGA GAACAACGTG TCAAGCEACT 

10001 TAAGCAAAAA TATTCTGGCT TTCCGTGACC AGAACATTCT CTTGGGTACA 

10051 ACTTACAGGA TCATAGCGAA TGCTCTGAGC AGTGAGCCAG CCTGCdTGC 

10101 TGAAATCGAG GAGGACAAGG CTAGAAGAAT CTTAGAGCTT TCTGGATCCA 

10151 GTTCAGAGGA TTCAGAGAAG GTGATCGCGG GTCTGTACCA GAGAGCATTC 

10201 CAGCACCTCT CTGAGGCTGT GCAGGCGGCT GAGGAGGAGG CCCAGCCTCC 

10251 CTCCTGGAGC TGTGGGCCTG CAGCTGGGGT GATTGATGCT TACATGACGC 

10301 TGGCAGATTT CTGTGACCAA CAGCTGCGCA AGGAGGAAGA GAATGCATCA 

10351 GTTACTGATT CTGCAGAACT GCAGGCGTAT CCAGCACTTG TGGTGGAGAA 

10401 AATGTTGAAA GCTTTAAAAT TAAATTCCAA TGAAGCCAGA TTGAAGITTC 

10451 CTAGATTACT TCAGATTATA GAACGGTATC CAGAGGAGAC TTTGAGCCTC 

10501 ATGACAAAAG AGATCTCTTC CGTTCCCTGC TGGCAGTTCA Ta^iGCTGGAT 

10551 CAGCCACATG GTGGCCTTAC TGGACAAAGA CCAAGCCGTT GCTGTTCAGC 

10601 ACTCTGTGGA AGAAATCACT GATAACTACC CGCAGGCTAT TGTTTATCCC 

10651 TTCATCATAA GCAGCGAAAG CTATTCCTTC AAGGATACTT CTACTCGTCA 

10701 TAAGAATAAG GAGTTTGTGG CAAGGATTAA AAGTAAGTTG GATCAAGGAG 

10751 GAGTGATTCA AGATTTTATT AATGCCTTAG ATCAGCTCTC TAATCCTGAA 

10801 CTGCTCTTTA AGGATTGGAG CAATGATGTA AGAGCTGAAC TAGCAAAAAC 

^ 10851 CCCTGTAAAT AAAAAAAACA TTGAAAAAAT GTATGAAAGA ATGTATGCAG 

10901 CCTTGGGTGA CCCAAAGGCT CCAGGCCTGG GGGCCTTTAG AAGGAAGTTT 

10951 A1TCAGACTT TTGGAAAAGA ATTTGATAAA CATTTTGGGA AAGGAGGTTC 

11001 TAAACTACTG AGAATGAAGC TCAGTGACTT CAACGACATT ACCAACATGC 

11051 TACTTTTAAA AATGAACAAA GACTCAAAGC CCCCTGGGAA TCTGAAAGAA 

11101 TGTTCACCCT GGATGAGCGA CTTCAAAGTG GAGTTCCTGA GAAATGAGCT 

11151 GGAGATTCCC GGTCAGTATG ACGGTAGGGG AAAGCCATTG CCAGAGTACC 
11201 ACGTGCGAAT CGCCGGGTTT GATGAGCGGG TGACAGTCAT GGCGTCTCTG 
11251 CGAAGGCCCA AGCGCATCAT aJVTCCGTGGC CATGACGAGA GGGAACACCC 
11301 TTTCCTGGTG AAGGGTGGCG AGGACCTGCG GCAGGACCAG CGCGTGGAGC 
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11351 AGCTCTTCCA GGTCATGAAT GGGATCCTGG CCCAAGACTC CGCCTGCAGC 
11401 CAGAGGGCCC TGCAGCTGAG GACCTATAGC GTTGTGCCCA TGACCTCCAG 
11 ''-5'' rrr^trirri'nfTiT, 'rm^ri'Tyr^c TTC;AA.^_-_T>.C TG"! 'VT-.CCTTG ?_Al!rGACCTTC 
11501 TTTTCAACAC CATGTCCCAA GAGGAGAAGG CGGCTTACCT GAGTGATCCC 
115S1 AGGGCACCGC CGTGTGAATA TAAAGATTCG CTGACAAAAA TGTCAGGAAA 
11601 ACATCATCTT GGAGCTTACA TGCTAATGTA TAAGGGCGCT AATCGTACTG 
11651 AAACAGTCAC GTCTTTTAGA AAACGAGAAA GTAAAGTGCC TGCTGATCTC 
11701 TTAAAGCGGG CCTTCGTGAG a\TGAGTA.CA AGCCCTGAGG CITTCCfTGGC 
11751 GCTCCGCTCC CACTTCGCCA GCTCTCACGC TCTGATATGC ATCAGCCACT 
11801 GGATCCTCGG GATTGGAGAC AGACATCTGA ACAACTTTAT GGTGGCCATG 
11851 GAGACTCGCG GCGTGATCGG GATCGACTTT GGGCATGCGT TTGGATCCGC 
11901 TACACAGTTT CTGCCAGTCC CTGAGTTOAT GCCTTTTCGG CTAACTCGCC 
11951 AGTTTATCAA TCTGATGTTA CCAATGAAAG AAACGGGCCT TATGTACAGC 
12001 ATCATGGTAC ACGCACTCCG GGCCTTCCGC TCAGACCCTG GCXTTGCTCAC 
12051 CAACACCATG GATGTGTTTG TCAAGGAGCC CTCCTTTGAT TGGAAAAATT 
12101 TTGAACAGAA AATGCTGAAA AAAGGAGGGT CATGGA1TCA AGAAATAAAT 
12151 GTTGCTGAAA AAAATTGGTA CCCCCGACAG AAAATATGTT ACGCTAAGAG 
12201 AAAGTTAGCA GGTGCCAATC CAGCAGTCAT TACTTGTaAT GAGCTACTCC 
12251 TGGGTCATGA GAAGGCCCCT GCCTTCAGAG ACTATGTGGC TGTGGCACGA 
12301 GGAAGCAAAG ATCACAACAT TCGTGCCCAA GAACCAQAGA GTGGGCTTTC 
12351 AGAAGAGACT CAAGTGAAGT GCCTGATGGA CCAGGCAACA GACCCCAACA 
12401 TCCTTGGCAG AACCTGGGAA GGATGGGAGC CCTGGATGTG AGGTCTQTGG 
124S1 GAGTCTGCAG ATAGAAAGCA TTACATTGTT TAAAGAATCT ACTATACTTT 
12501 GGTTGGCAGC ATTCCATOAG C1V3ATTTTCC TGAAACACTA AAGAGAAATG 
12551 TCTTTTGTGC TACAGTTTCG TAGCATGAGT TTAAATCAAG ATTATGATGA 
12601 GTAAATGTGT ATGGGTTAAA TCAAAGATAA GGrTTATAGTA ACATCAAAGA 
12651 TTAGGTGAGG ITTATAGAAA GATAGATATC CAGGCTTACC AAAGTATTAA 
12701 GTCAAGAATA TAATATGTGA TCAGCTTTCA AAGCATTTAC AAGTGCTGCA 
12751 AGTTAGTGAA ACAGCTGTCT CCGTAAATGG AGGAAATGTG GGGAAGCCTT 
12801 GGAATGCCCT TCTGGTTCTG GCACATTGGA AAGCACACTC AGAAGGCTTC 
12851 ATCACCAAGA TTTTGGGAGA GTAAAGCTAA GTATAGTTGA TGTAACATTG 
12901 TAGAAGCAGC ATAGGAACAA TAAGAACAAT AGGTAAAGCT ATAATTATGG 
12951 CTTATATTTA GAAATGACTG CATTTGATAT TTTAGGATAT TTTTCTAGGT 



13001 'rrTTTCCTTT CATTTTATTC TCTTCTAGTT TTGACATTTT ATGATAGA.TT 

-,i3051 'TCCTCTCTAG AAGGAAACGT CTTTEATTTAG GAGGGCAAAA ATTTTGGTCA 

13101 TAGO-TTXTLtC TTTTGCTATT CC>Ji_TCT>.CA ACTGGAAC1A.T ACATAAAAGT 

13151 GCTTTCCATT GAATTTGGGA TAACTTCAAA AATCCCATC3G TTGTTGTTAG 

13201 GGATAGTACT AAGCAITICA GTTCCACGAG AATAAAAGAA ATTCCTATTT 

13251 GW^TGAATT CCTCATTTGG AGGAAAAAAA GCATGCATTC TAGCACAACA 

13301 AGATGAAATT ATGGAATACA AAAGTGGCTC CTTCCCATGT GCAGTCCCTG 

13351 TCCCCCCCCG CCAGTCCTCC ACACCCAAAC TGTTTCTGAT TGGCTTTTAG 

13401 CTTTTTGTTG ' I ' lTl ' l ' l ^ i ' ITr TCCTTCTAAC ACTTGTATTT GGAGGCTCTT 

13451 CTGTGATTTT GAGAAGTATA CTCTTGAGTG TTTAATAAAG TTTmrcCA 

13501 AAAGTA 
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Location/Qualifiers 

1. .3304 

/organism='Hoaio aapiens' 
<1. .2449 

/no^e--Xu (d70/'p9O) subunic tnRtlA (aIc.1" 
<1. .3304 

/nocesr-Ku (70/p80) oubunlt niRMA talc.)* 
22. .31 

/noce="rlboBome bir.dins aice" 
26 . .2226 

/noto="Ku (p70/pB0) subxiiiif 

/codon_5tiirc=l 

/db_>trof*-PXD:g307094" 

/ 1 r ansla C i on= ■ MVaSGNKAAVVIiCMDVGPTM.?KSI PGIESPFSQAKJCv— TM^VOR 
OVTAENKDEIALVLFGTTCTONPLSCWDQYOKITVHBHLMLPDFDLLEDISSKIQPOS 

LQFFtPFSLGKeDGSCDRGDGPFaLGGKGSSrPLlCGITEQQXSGLErVKMVMISLEGE 
DGLDSIYS FSESLRKU:VFKKI ERHS IHWPCRLTI G3NLSI RlAA VK S I LQERVlCKr»-J 
TVVDAKTLK.KEDIQKETVYCLMDDDBTSVLKEDI10GFRYGSDIVPf£KVDE£QMKYK 
SECKCFSVLaFCKSSQVQRRF?KGNQVI,KVFAJUlDDSAAAVALSSLIHALDOLDKVJ.I 
\^YAVPKRAN?QVGVAF?HIKHNYECLVYV0LPn<E2IJt.QyXFSSLKNSK.=<YAPTEAQ 
LNAVDALrDSWSLAJCXDE:<TI>TljEDLFPTTXlPNPRPOia.FQClJJHRAL 
QOHIWKMLNPPAEVTTiCSOIPLSKrKTLFPLIEAKKKDOVTAQE-FQDKHEDGFTASK 
LKTEQGGAHFSVSSI^GSVTSVGSVKPAENFRVLVKQKKXSFEEASKQLIKHISQrL 
DTNETTPY FM KS I DCI lUVFREEAI KF SE EQRFNNFLIULQ ESCVEI KQLNH rvJE I WQ 
ITLITKEEASGSSVTAEEAKKFIAPiOJKPSGDTAAVFSECCIJVDIJLLDMI • 
2&31, . 2436 

/note='pQly-A Signal (alt.)- 
655 c 735 g 942 t 
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HASE COUira 972 a 

oaiGiiT 

1 cgaccaaagc occtgaggAC cggcaacatg gtgcggtcgg ggaataaggc agctgttgts 
61 ccgtgcatgg acgtgggcct tdccatgagc aactccattc cbggtataga acccccatct 
121 gaacaagcaa agaaggcgac aaccatgttt gtacagcgac aggtgtccgc tgagaacaag 
181 gatgagaccg cttcagcccc gcttggtaca gatggcaccg acaatcccct ccecggcagg 
241 gaccagtatc ogaacaccac agcgcacaga catccgatgc caccagacct cgacttgctg 
3Q1 gaggacaccg aaagcaaaat ccaaccaggt cctcoacagg ctgacttccc ggaescacca 
361 atcgcgagca cggatgtgac ccaacacgaa acaataggaa agaagctcga gaagaggcae. 
421 actgaaatac ccaccgacct cagcagccga tccagcaaaa gccagctgga catcacaact 
4 81 catagcccga agaaacgtga cacctccccg caattcttct tgcccctcte acttggcaog 
541 gaagatggaa gcggggacag aggagatggc eectttcgcc taggtggcca cgggcccccc 
601 tttccactaa aaggaattac cgaacagcaa aaagaaggtc ttgagacagt gaaaacggtg 
661 atgacatctc tagaaggcga agatgggccg gacgaaacct ctCcactcag cgagagcccg 
721 agaoaaccgc gcgccttcaa gaaaattgag aggcatccca ttcactggcc ccgccgactg 
781 accattggcc ccaacttgcc tacaaggact gcagcctata aatcgottcc acaggagaga 
B«l gttaaaaaga cttggacagc cgtggatgca aaaaccctaa aacaagaaga tacacaaaaa 
901 gaaacagttc atcgcttaaa cgatgacgat gaaactgaag tt::caaaaga ggacatcatt 
961 caagggttcc gctacggaag cgacatagtt ccctcctcca aagcggatga ggaacaaatg 
1021 aaatacaaat cggaggggaa gcgcttctct attttgggat cttgcaaatc tcctcaggtc 
1061 cagagaagat tcctcatggg aaatcaagcc ccaaaggtct ctgcagcaag agatgacgag 
1141 gcagccgcag tcgcaccttc ctccccgacc catgctttgg acgacccaga catggcggcc 
12 01 acagcccgac acgcccatga caaaagagec aatcctcaag tcggcgtggc ctttcctcac 
12SI accaagcaca accacgaacg tccagtgtac gcgcagccgc ctttcatgga agacttgcgg 
1321 c«ataca.tgc tctcaccctt gaaaaacagc aagaaacacg cccccaccga ogcncogctg 
1381 aatgccgctg acgctctaat tgaccceatg agctcggcaa agaaagacga gaagscagoc 
1441 accctcgaag actcgttccc aaccaccaaa accccaaacc ctcgacccca aagactatcc 
1501 cagcgcccge cgcacagagc ttcacatccc cgggagcctc cneecccaat ccagcagcet 
1561 acctggaaca cgccgaatcc tcccgccgag gtgaceacaa aaagtcaoac ccccccetct 
1621 aaaataaaga ccccccctcc tccgattoaa gccoagaaaa aggaccaagt gaccgcccag 
1681 gaaatccccc aagacaaeca tgaagatgga cctocagcca aaaaattaaa gcccgagcaa 
17 41 gggggagccc actccagcgc ctccagcctg gctgaaggca gcgtcacctc cgtcggaagc 
IBOl gcgaaccccg ctgaaaactt ccgtgttcta gcgaaacaga agaaggccag cttugagga© 
1661 gcgagcaacc agctcacaaa tcacatcgaa cagcttccgg atactaacga aacaccgtac 
1921 cctacgaaga gcotagaccg catccgagcc ttccgggaag aagccattaa gtcctcagaa 
1981 gagcagcacc ccaacaactc cccgaaagcc cttcaagaga aagcggaaat toaacaatta 
20fli aaccacccct ggga^atcgc tgcccaggat gganccaccc tgaccaccaa figaggaagcc 
2101 tccggaagcc ccgtcacagc cgaggaagcc aaaaagtctc uggcccccaa agacaaacca 
2161 agcggagaca cagcagctgt acttgaagaa ggtggcgacg cggacgacct atcggacacg 
2221 acataggtcg cggacgtacg gggaatctaa gagagctgcc accgccgtaa cgccggg^tgc 
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3821 

2Bei 



ccaC9gcggc 
cccccagaaa 
3001 gccccccact 
cccccggccc 
caccaaacfta 
cgagccccag 



3061 
3121 
3181 



3241 cacccccatft 
3301 *ccc 



coAgccggac 
tggaggcgga 
atcCagaccc 
cccccgcggc 
gaccccttcc 
Ccacgcggaa 
cgcccgaccc 
ccceacaaoc 
ccgaggcccc 
ffsgcgggaag 
sgcc ccacga 
cccao&aacc 
cgcaagcccc 
cccccatcec 
gaaaataaca 
cccccccceg 
agccgxcacc 



gcggccaccc 
ccacecaat 
cacacaagc 
ctcactgacc 
cgacctagca 
ccacggcacg 
cccaacagct 
agccgggcca 
c cage caeca 
gggagcacaa 
aga&aagacc 
oagaaACCca 
aaagag&gaa 
cacccaagcc 
aagcccccac 
gcgcacagac 
aacacacagc 




iggagcc 
igcggaa 
laagagc 
;gcacacc 
caccagtgac 
gccaccgatg 
gtcacaactt 
ggaaaaccac 
cattacccca 
ccccccccca 
ctctggccca 
czccagcagc 
agcttcgcca 
asctttcacc 
cccctctccc 
ccctcggcac 
cccgcacacg 



aaaaccccaa 
cgaacacaca 
catcgccacc 
acacacacgc 
agttgcagcc 
agccccccaA 
gcgccgagca 
gggcaaggac 
cccccgcaca 
cactcctccc 
acccccccca 
cagaa^caca 
cattaaaaca 
cccgtggatg 
cctcgcccrc 
ccagccaccc 
caacaccaaa 



gaaaccccca 
cacacacact 
ttccggxcsg 
tccgaagccc 
ctcgcgacgc 
ccccccccag 
agcagcagca 
ggacccsccc 
teggcgaccc 
aageagcgag 
^dccagrgaa 
ccaccccacc 
cccaggcaac 
gcgtcccccc 
atccccgccc 
ctgcccccag 
ggcaca^cg 
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536729 2123 bp DRNA Ga-SEP-1992 

Km aucoanciger. oTO aubunic tbvjaen. raWA, 2123 r.c] . 

338729 

9350496 



TITLE 



JOURMXL 



aource 



gene 
CDS 



BASE COUSTT 
ORIGXM 

1 



635 



human. 

Homo sai^ienS 

EMlcsryccae; mitochc^ndrial eukaryorres; 
V«::ceiiracd; Eutr.erii; ?riirace:;; Caca: 
1 (baaea L cc 2123) 

Criirich.A. J. , Craft, J., tvans.J., Mimori^T. and Hardin, J. A. 
Nuclftoclde gequencc and genomic structure analyses oi che o70 
subunit oi the human ku aucoancigcn: evidence for & family o2 g^nea 
encoding Xu (p70} -rolaced polypootidfcs 
Mol. Biol, aep, i6 tZ) . 91-9*7 (19921 
92301477 

Genaank stait at the hfatioftal Library o£ Medicine created this 
entry (NCSI gibbsq 1072 051 iron the original jourri*! article. 
Thia Beauoncft comes from Fig. 3. 
Location /Qualifiers 
1. .2123 

/ organisms ■ Homo sapifinB* 
iS. .1854 

/gene«"Ku outoantigen o7o subu^ji^- 

/note="ThlB eecucnee comes from Fig. 3.- 
/codon_3tarc-l 

/products' KU autoor.tigen. p70 subunli." 
/db_xxoi=" PID: 92 50497 ■ 

translations ■HSGWBSYYXTEGDEEAEEEQSESTIiEASGDVKYSGRDSLjrFLVOA. 
SrAMFESQSEDELTPFTJMS IQCIQSWISKl I SSDRDUJVVVFYCTEKDKMSVNFWl I 
YVU3EL DNPGXKHI LSbDQ FKGQQGQKRFQDKMCKGSDVSLSEVtWVCAKLF SDVQF K 
WSKKRIMLFrNEDNPHGNDSAKASiL\aTKACri^flDTOIFIJ31JIHI/XKFWrDZSI/?T'a 
Dl I S 1 AE0EDLRVHFE3S S KL EDLLRlCVRAltBt RSCftALSaUCUU^UKDI VI SVGI YN L 
VQKALK? PPIKLYRSrrNEFVKTKTRTrN^STGGLLLPSDTKRSQ I VGSRQ I IL EXSET 
EELKRFDDPGLHLKCFKPLVLLXKHKVLKPSLFVVtESSLVIGSSTLFSALCrKCIjEK 
EVAALCHYTPaRNIPPYFVAliVPQSESLDOOKIQVTPTOFQLVTl-PFADDKRKMPFTE 
K I KATPEQVCSXMK.^ I VEKLRFTYRS D9 F EITS VI:Q0H5Rm.EA1^0IiKEP EOA^ 
F KV^AiOfSKLGSL VDErKELVy?9DYN PMKVTKRKHDNBGSO SKStPKVEy SEE E LKT 
HI SKGTLGKPTVPKUCEACRAYGLKSGLXKQEL£.EALrKHFQD " 
a 455 c 3d9 g 464 C 

gageagtagc coftcfi^tca gggegggagc catattacaa aaccaaggjc 
61 gacgaagaao cagaggaaga acaagaagag aaccttgaag caagtsgagA ctataaacat 
121 ccaggaagag atagtetgat ntttttggtt gargcctcca aggctatgtt tgaatctcag 
191 agcgaagatg agtbgacacc tztcgacacg agcaccc&gt gtazccaaag tgtgtacacc 
241 agtaagatca taagcagtga ccgagatccc tcggccgcgg tgccccacgg taccgagaaa 
301 gecaaaaatt cagtgeattt ::aaaaatatt cacgccttac eggagctgga taatccaggc 

3 61 gcaaaacgaa ttctagagct tgaccagctc aaggggcagc agggacaaaa acgtttccao 
421 gacacgacgg gccacggatc cgactactca cccagtgoag cgctgcgggc ctgcgccaac 

4 61 ctctctagtg atgtccaatt caagatg»gc cacaagagga teatpctgct caccaatgaa 
S41 gacaaccccc atggcaacga c&gcgccaaa gccagccggg ccaggaccaa agccggcgac 
SOI ctccgagaca caggcaccct ccctgacttg atgcacctga ageaacccgg gggctctgac 
€61 atatccttgt tccacagaga taccatcagc atagcagagg atgcggaccc cagggcfccac 
721 tttgaggaat ccagcaagct agaagacccg t^gcggaagg tccgcgccaa ggagaccagg 
791 aagcgagcac tcagcaggtt aaagctgaag ctcaacaaag azacagcgat ctctgtgggc 
641 atttacaatc tggtccagaa ggctctcaag cctcctccaa caaagcccta tcgggaaaca 
901 aatgaaccag tpaonaccaa gacccggacc ttcaacacaa gcacaggcgg tccgetcccg 
961 cctagcgata ccaagoggcc ccagatccoc gggagtcgtc agatcatact ggagaaagag 

1021 gaaacagoag agccaaaacg gtttgatgat ccaggtctga tgctcatggg cttcaagccg 
loai ctgffcaccgc tgaagaaaca ccaceacctg aggccccccc cg=tcgcgca cccagaggag 
1141 ccgccggtga tcgggagctc aaccctgttc agtgctccgc tcotcaagtg tctggagaag 
1201 gaggttgcag cactgcgcag acacacaccc cgcaggaaca cccctectta ccttgtggcc 
1261 ttggcgccac aggaagaaga gctggacgac cagaaaattc aggtgactcc tccaggctcc 
1321 cagctggtcc ttttaccctt cgccgacgat aaaaggaaga tgccccttac tgaaaaaatc 
1391 atggcaaccc cagagcaggc gggeaagatg aaggctaccg tsgagaagct ccgctccaca 
1441 cacagaagcg acagettcga gaaccccgtg ctgcagcagc acttcagga^ cctggaggcc 
ISOl ttggcettgg acctgatgga gccggaacaa gcagtggaec tgacattgcc caaggctgaa 
1561 gcaacgaata aaagactggg ctccttggcg gatgagttta aggagctcgt ccacccacca 
1621 gattacaatc ct^aagggaa agttaccaag agaaaacacg ataatgaagg tcctggsagc 
1681 aaaaggccca aggtggagta ctcagaagag gagccgaaga cccacatcag ce&gggtacg 
17 41 ctgGscaagc tc&czgzacc cetgccgaaa gaggcecgcc gggctcacgg gccgaagagc 
laol ggtctgaaga agcaggagct gctggaagcc ctcaccaagc acccceagga ctgticcagag 
1B61 gccgcgcgtc cagctgccct tccgcagtgc ggceaggctg cctggeccca tcctcagcca 
1921 gctaaaacgc gcctccccco agcca^gaag agtctacccg acocaagtcg agggaectta 
1981 tgcttttgag gctttctgtt gccacgatoa tggtgtagcc ccccc&cttt gctgttccct 
2041 ttctttaccgc ctgaataaag agccctaagt ttgtactaaa aaaaaaaaaa aaaoaaaaaa 
2101 aaaaaaaaaa aaaaaaaaaa a&a 



gcgccaaagt 



TOTfiL P. 16 



